ArXiv TLDR

Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

🐦 Tweet
2604.15196

Umer Ahmed, Syed Ahmed Mahmood, Fawad Javed Fateh, M. Shaheer Luqman, M. Zeeshan Zia + 1 more

cs.CV

TLDR

This paper introduces a hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based action segmentation, achieving SOTA.

Key contributions

  • Proposes a hierarchical spatiotemporal vector quantization for unsupervised action segmentation.
  • Uses two levels: lower quantizes subactions, higher aggregates into action-level representations.
  • Leverages both spatial (skeleton reconstruction) and temporal (timestamp recovery) information.
  • Achieves new state-of-the-art performance and reduces segment length bias on benchmarks.

Why it matters

This paper introduces a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based action segmentation. It significantly advances the field by achieving state-of-the-art performance and reducing segment length bias on multiple benchmarks.

Original Abstract

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive experiments on multiple benchmarks, including HuGaDB, LARa, and BABEL, demonstrate that our approach establishes a new state-of-the-art performance and reduces segment length bias in unsupervised skeleton-based temporal action segmentation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.