ArXiv TLDR

Learning Versatile Humanoid Manipulation with Touch Dreaming

🐦 Tweet
2604.13015

Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran + 6 more

cs.RO

TLDR

This paper introduces Humanoid Transformer with Touch Dreaming (HTD), enabling versatile, high-dexterity humanoid manipulation via predictive touch-centered learning.

Key contributions

  • An RL-based whole-body controller ensures stable lower-body and torso execution for complex tasks.
  • A scalable data collection system combines VR teleoperation with human-to-humanoid motion mapping.
  • Humanoid Transformer with Touch Dreaming (HTD) uses predictive touch to learn contact-aware representations.
  • HTD achieves 90.9% relative improvement in success rate on five contact-rich manipulation tasks.

Why it matters

Humanoid robots face challenges in real-world loco-manipulation due to stability and contact perception. This paper addresses these by combining robust control, efficient data collection, and a novel touch-aware learning architecture. It significantly advances the field of dexterous, contact-rich humanoid manipulation, paving the way for more general-purpose assistance.

Original Abstract

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.