清华大学

Project Overview

Multimodal Data Synchronization

For the first time globally, we realize microsecond-level synchronization of bilateral wristband EMG, egocentric RGB, external RGB-D and optical motion capture data, solving the core problem of lack of cross-modal alignment in existing datasets.

High-Precision Pose Annotation

Adopting a Graph Transformer based markers2mano reconstruction pipeline, the invalid frame rate is reduced from 12.7% to 3.6%, with an average marker alignment error of only 4.3mm, including complete wrist joint angle annotations.

Standardized Benchmark Evaluation

We build a unified benchmark covering three major tasks: EMG-to-pose, Vision-to-pose, and EMG+Vision fusion, providing three generalization evaluation settings: cross-gesture, cross-user, and combined split.

Research Content

Overall System Architecture (See Figure 1 in the Paper)

Core Research Content

Design and implementation of multimodal hand pose data acquisition system
High-precision hand pose reconstruction method based on optical motion capture
Construction and standardization of large-scale multimodal hand pose dataset
Benchmark for EMG-Vision fusion hand pose estimation algorithms

Technical Roadmap

Data Collection: Build multi-sensor synchronization system; collect 60 gesture categories from 41 subjects; record 10 hours of multimodal data
Data Processing: EMG signal filtering and preprocessing; optical motion capture marker reconstruction; MANO parameter and joint angle conversion
Benchmark Construction: Define three major tasks; unified evaluation metrics; implementation and evaluation of multiple baseline models

Tech Stack

Hardware: WAVELETECH EMG Wristband, FZMotion Optical Motion Capture System, ZED 2i RGB-D Camera, GoPro Wide-Angle Camera
Software: Python, PyTorch, MANO Hand Model, Parquet Data Format
Algorithm: Transformer, ResNet, ViT, Multimodal Residual Fusion, Graph Neural Network

Key Statistics

Core Research Content

Other description text

Dataset Scale: 10+ Hours Multimodal Recording

41 subjects (23 males, 18 females, average age 24), 60 gesture categories (30 single-hand + 30 bimanual), covering diverse hand movement patterns.

Technical Roadmap

Other description text

Modal Coverage: 5 Synchronized Data Modalities

16-channel EMG (2kHz), 120Hz IMU, 60fps egocentric RGB, 30fps external RGB-D, 120Hz raw optical motion capture data.

Design Concept

Other description text

Annotation Precision: 22-DOF Joint Angles

Including 20 finger joint angles and 2 wrist joint angles, average marker alignment error 4.3mm, with MANO mesh parameters provided.

Full-Link Synchronized Data Acquisition Engine

Centered on high-precision time synchronization, it integrates EMG, inertial, vision and motion capture modalities to perfectly align hand muscle activity and visual posture, providing high-quality data foundation for multimodal hand perception research.

Bilateral Wristband EMG Sensors

8-channel surface EMG acquisition per wrist at 2kHz high sampling rate, covering major forearm muscle groups; ≤50g lightweight design, comfortable to wear without restriction, supporting long-term continuous acquisition.

Dual-View Vision Perception System

Head-mounted wide-angle RGB camera provides egocentric hand view; external ZED 2i RGB-D camera provides global 3D scene information; dual-camera collaboration achieves complete perception of hands and environment.

Multimodal Microsecond Time Synchronization

Soft synchronization based on host timestamp combined with linear interpolation achieves precise time alignment of all modal data; synchronization error less than 1ms, ensuring consistency across modalities.

Equipment List

Handheld Device

Dataset Introduction

EgoEMG is the first multimodal hand pose estimation dataset with synchronized bilateral wristband EMG and egocentric vision, constructed by the Intelligent Vision and Graphics Laboratory (IVG), Department of Automation, Tsinghua University. The dataset contains 10 hours of multimodal data of 60 gesture categories completed by 41 subjects, covering single-hand and bimanual interaction scenarios, with high-precision MANO parameters and 22-DOF joint angle annotations. It fills the gap of lack of cross-modal alignment in existing resources and provides a unified benchmark for EMG-vision fusion hand pose estimation research.

Tsinghua University EgoEMG

Project Overview

Multimodal Data Synchronization

High-Precision Pose Annotation

Standardized Benchmark Evaluation

Research Content

Core Research Content

Technical Roadmap

Tech Stack

Key Statistics

Core Research Content

Dataset Scale: 10+ Hours Multimodal Recording

Technical Roadmap

Modal Coverage: 5 Synchronized Data Modalities

Design Concept

Annotation Precision: 22-DOF Joint Angles

Full-Link Synchronized Data Acquisition Engine

Bilateral Wristband EMG Sensors

Dual-View Vision Perception System

Multimodal Microsecond Time Synchronization

Equipment List

Handheld Device

Dataset Introduction

Dataset Categories

Single-Hand Gesture Dataset

Symmetric Bimanual Gesture Dataset

Asymmetric Bimanual Gesture Dataset

EMG Signal Dataset

Egocentric RGB Video Dataset

External RGB-D Dataset

IMU Motion Dataset

Optical Motion Capture Raw Marker Dataset

MANO Parameter Annotation Dataset

22-DOF Joint Angle Dataset

Cross-Gesture Split Dataset

Cross-User Split Dataset

Combined Split Dataset

Preprocessed Dataset

Data Processing Toolkit

Demo Cases

Multimodal Data Synchronization Visualization

EMG-to-Pose Reconstruction Demo

Vision-to-Pose Reconstruction Demo

EMG+Vision Fusion Reconstruction Demo

Cross-User Generalization Test

Markers2Mano Reconstruction Accuracy

Hand Self-Occlusion Analysis Demo

Dataset Gesture Sample Showcase

Project Leader

Prof. Jianjiang Feng

Prof. Jie Zhou

Main Contributors

Ziheng Xi

Jiayi Yu

Yitao Wang

Yanbo Duan

Core Team Strengths

Top Academic Background

Interdisciplinary Research Capability

Rich Dataset Construction Experience

World-Leading Research Achievements

Contact Us

Technical Contact

Technical Team

Technical WeChat Group