Project Overview

Multimodal Data Synchronization

For the first time globally, we realize microsecond-level synchronization of bilateral wristband EMG, egocentric RGB, external RGB-D and optical motion capture data, solving the core problem of lack of cross-modal alignment in existing datasets.

High-Precision Pose Annotation

Adopting a Graph Transformer based markers2mano reconstruction pipeline, the invalid frame rate is reduced from 12.7% to 3.6%, with an average marker alignment error of only 4.3mm, including complete wrist joint angle annotations.

Standardized Benchmark Evaluation

We build a unified benchmark covering three major tasks: EMG-to-pose, Vision-to-pose, and EMG+Vision fusion, providing three generalization evaluation settings: cross-gesture, cross-user, and combined split.

Research Content

系统整体架构图

Overall System Architecture (See Figure 1 in the Paper)

Core Research Content

  • Design and implementation of multimodal hand pose data acquisition system
  • High-precision hand pose reconstruction method based on optical motion capture
  • Construction and standardization of large-scale multimodal hand pose dataset
  • Benchmark for EMG-Vision fusion hand pose estimation algorithms

Technical Roadmap

  • Data Collection: Build multi-sensor synchronization system; collect 60 gesture categories from 41 subjects; record 10 hours of multimodal data
  • Data Processing: EMG signal filtering and preprocessing; optical motion capture marker reconstruction; MANO parameter and joint angle conversion
  • Benchmark Construction: Define three major tasks; unified evaluation metrics; implementation and evaluation of multiple baseline models

Tech Stack

  • Hardware: WAVELETECH EMG Wristband, FZMotion Optical Motion Capture System, ZED 2i RGB-D Camera, GoPro Wide-Angle Camera
  • Software: Python, PyTorch, MANO Hand Model, Parquet Data Format
  • Algorithm: Transformer, ResNet, ViT, Multimodal Residual Fusion, Graph Neural Network

Key Statistics

Core Research Content

Other description text

Dataset Scale: 10+ Hours Multimodal Recording

41 subjects (23 males, 18 females, average age 24), 60 gesture categories (30 single-hand + 30 bimanual), covering diverse hand movement patterns.

Technical Roadmap

Other description text

Modal Coverage: 5 Synchronized Data Modalities

16-channel EMG (2kHz), 120Hz IMU, 60fps egocentric RGB, 30fps external RGB-D, 120Hz raw optical motion capture data.

Design Concept

Other description text

Annotation Precision: 22-DOF Joint Angles

Including 20 finger joint angles and 2 wrist joint angles, average marker alignment error 4.3mm, with MANO mesh parameters provided.

Full-Link Synchronized Data Acquisition Engine

Centered on high-precision time synchronization, it integrates EMG, inertial, vision and motion capture modalities to perfectly align hand muscle activity and visual posture, providing high-quality data foundation for multimodal hand perception research.

Bilateral Wristband EMG Sensors

8-channel surface EMG acquisition per wrist at 2kHz high sampling rate, covering major forearm muscle groups; ≤50g lightweight design, comfortable to wear without restriction, supporting long-term continuous acquisition.

Dual-View Vision Perception System

Head-mounted wide-angle RGB camera provides egocentric hand view; external ZED 2i RGB-D camera provides global 3D scene information; dual-camera collaboration achieves complete perception of hands and environment.

Multimodal Microsecond Time Synchronization

Soft synchronization based on host timestamp combined with linear interpolation achieves precise time alignment of all modal data; synchronization error less than 1ms, ensuring consistency across modalities.

Equipment List

Handheld Device

Dataset Introduction

EgoEMG is the first multimodal hand pose estimation dataset with synchronized bilateral wristband EMG and egocentric vision, constructed by the Intelligent Vision and Graphics Laboratory (IVG), Department of Automation, Tsinghua University. The dataset contains 10 hours of multimodal data of 60 gesture categories completed by 41 subjects, covering single-hand and bimanual interaction scenarios, with high-precision MANO parameters and 22-DOF joint angle annotations. It fills the gap of lack of cross-modal alignment in existing resources and provides a unified benchmark for EMG-vision fusion hand pose estimation research.

Demo Cases

Project Leader

Prof. Jianjiang Feng

Prof. Jianjiang Feng

jfeng@tsinghua.edu.cn

Prof. Jie Zhou

Prof. Jie Zhou

jzhou@tsinghua.edu.cn

Main Contributors

Ziheng Xi

Ziheng Xi

PhD Candidate

Jiayi Yu

Jiayi Yu

PhD Candidate

Yitao Wang

Yitao Wang

PhD Candidate

Yanbo Duan

Yanbo Duan

PhD Candidate

Core Team Strengths

Top Academic Background

Core members are from the Department of Automation, Tsinghua University, with over 20 years of research accumulation in computer vision and pattern recognition, undertaking multiple national scientific research projects.

Interdisciplinary Research Capability

Integrating knowledge of biomedical engineering, human-computer interaction and embedded systems, covering the full chain from hardware acquisition to algorithm research.

Rich Dataset Construction Experience

The team has built multiple hand pose related datasets, with mature technical systems in data collection, annotation and standardization.

World-Leading Research Achievements

Dozens of top papers published in hand pose estimation and EMG decoding, with related technologies widely applied in AR/VR and prosthetic control.

Contact Us

Technical Contact

111111@qinghuadaxue.com

Technical Team

Tsinghua University

Technical WeChat Group

微信