OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

May 6, 20262605.04791

Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng + 2 more

cs.HC

TLDR

OpenWatch introduces a multimodal benchmark for smartwatch hand gesture recognition, along with novel methods (MixToken, NormWear-Lora) and key findings.

Key contributions

OpenWatch: first open-access multimodal benchmark for wrist-based gesture recognition on smartwatches.
Dataset includes 10+ hours of IMU/PPG data from 50 participants, with 59 labelled gesture sequences.
Introduces MixToken, a task-specific mixture-of-experts achieving 90% F1-score with high efficiency.
Shows PPG signals provide a substantial +12.5% F1-score benefit for smartwatch foundation models.

Why it matters

This paper fills a critical gap by providing the first open-access multimodal benchmark for smartwatch gesture recognition. Its novel methods and empirical insights offer significant advancements for developing efficient and accurate wearable sensing applications, highlighting the value of multimodal data and specialized architectures.

Original Abstract

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we intro- duce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a com- mercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing, and (ii) NormWear-Lora, a low-rank adaptation module for smartwatch foundation models. Our benchmarking results reveal that PPG signals carries a sub- stantial predictive benefit (+12.5% F1-score) for foundational smartwatch models. In addition, we show that task-specific architectures (i.e. MixToken) substantially outperforms finetuned smartwatch foundation models in terms of accuracy (F1- score=90% vs 66%) and memory efficiency (223k vs 136M parameters). Finally, we also provide clear empirical guidance on the trade-offs between specialized architecture design, modality fusion, data augmentations, and foundation-model adaptation for resource-constrained wearable sensing.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers