X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

May 12, 20262605.12162

Kai Xiong, Hongjie Fang, Lixin Yang, Cewu Lu

cs.RO

TLDR

X-Imitator introduces a bidirectional framework for robotic manipulation, tightly coupling spatial perception and action generation for improved performance.

Key contributions

Proposes X-Imitator, a dual-path framework for spatial-aware imitation learning.
Models spatial perception and action execution as a tightly coupled bidirectional loop.
Enables continuous mutual refinement between spatial reasoning and action generation.
Significantly outperforms existing methods on 24 simulated and 3 real-world robotic tasks.

Why it matters

This paper addresses a critical bottleneck in robotic manipulation by introducing a novel bidirectional interaction model. By mimicking human internal forward models, X-Imitator achieves superior performance in complex tasks, advancing visuomotor control. Its modular design allows for broad applicability.

Original Abstract

Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments across 24 simulated and 3 real-world tasks demonstrate that our framework significantly outperforms both vanilla policies and prior methods utilizing explicit pose guidance. The code will be open sourced.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers