OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches
Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng + 2 more
TLDR
OpenWatch introduces a multimodal benchmark for smartwatch hand gesture recognition, along with novel methods (MixToken, NormWear-Lora) and key findings.
Key contributions
- OpenWatch: first open-access multimodal benchmark for wrist-based gesture recognition on smartwatches.
- Dataset includes 10+ hours of IMU/PPG data from 50 participants, with 59 labelled gesture sequences.
- Introduces MixToken, a task-specific mixture-of-experts achieving 90% F1-score with high efficiency.
- Shows PPG signals provide a substantial +12.5% F1-score benefit for smartwatch foundation models.
Why it matters
This paper fills a critical gap by providing the first open-access multimodal benchmark for smartwatch gesture recognition. Its novel methods and empirical insights offer significant advancements for developing efficient and accurate wearable sensing applications, highlighting the value of multimodal data and specialized architectures.
Original Abstract
Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we intro- duce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a com- mercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing, and (ii) NormWear-Lora, a low-rank adaptation module for smartwatch foundation models. Our benchmarking results reveal that PPG signals carries a sub- stantial predictive benefit (+12.5% F1-score) for foundational smartwatch models. In addition, we show that task-specific architectures (i.e. MixToken) substantially outperforms finetuned smartwatch foundation models in terms of accuracy (F1- score=90% vs 66%) and memory efficiency (223k vs 136M parameters). Finally, we also provide clear empirical guidance on the trade-offs between specialized architecture design, modality fusion, data augmentations, and foundation-model adaptation for resource-constrained wearable sensing.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.