SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

May 7, 20262605.06351

Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma + 3 more

cs.HC

TLDR

SIGMA-ASL is a new multimodal dataset integrating vision, radar, and IMU data for robust and privacy-preserving sign language recognition.

Key contributions

Introduces SIGMA-ASL, a large-scale multimodal dataset for sign language recognition.
Integrates RGB-D camera, mmWave radar, and wrist IMUs for diverse visual, radio, and kinematic data.
Contains 93,545 synchronized clips of 160 ASL signs from 20 participants.
Provides standardized preprocessing and benchmarking protocols for SLR evaluation.

Why it matters

This paper addresses the limitations of vision-only sign language recognition by offering a diverse, privacy-preserving multimodal dataset. SIGMA-ASL enables the development of more robust and ubiquitous SLR systems, fostering inclusive human-computer interaction.

Original Abstract

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are limited by sensitivity to lighting and occlusion, privacy concerns, and a lack of cross-modal diversity. To address these challenges, we introduce SIGMA-ASL, a large-scale multimodal dataset for SLR. The dataset integrates an Azure Kinect RGB-D camera, a millimeter-wave (mmWave) radar, and two wrist-worn inertial measurement units (IMUs) to capture complementary visual, radio-reflection, and kinematic information. Collected in a controlled studio environment with 20 participants performing 160 common American sign language (ASL) signs, SIGMA-ASL provides 93,545 temporally synchronized word-level multimodal clips. A unified sensing framework achieves millisecond-level alignment across modalities, enabling reliable sensor fusion and cross-modal learning. We further design standardized preprocessing pipelines and benchmarking protocols under both user-dependent and user-independent settings, offering a comprehensive foundation for evaluating single and multimodal SLR. Extensive experiments validate the dataset's quality and demonstrate its potential as a valuable resource for developing robust, privacy-preserving, and ubiquitous sign language recognition systems.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers