ArXiv TLDR

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

🐦 Tweet
2604.15023

Ziyu Shan, Yuheng Zhou, Gaoyuan Wu, Ziheng Ji, Zhenyu Wu + 1 more

cs.RO

TLDR

DockAnywhere improves mobile manipulation policy generalization by generating diverse demonstrations from a single one, decoupling base motions from manipulation skills.

Key contributions

  • Proposes DockAnywhere, a data-efficient framework for mobile manipulation policy learning.
  • Generates diverse demonstrations from a single one to improve viewpoint generalization under docking variability.
  • Decouples docking-dependent base motions from invariant contact-rich manipulation skills.
  • Synthesizes visual observations in 3D using point clouds and spatial editing for consistency.

Why it matters

This paper addresses the critical view generalization problem in mobile manipulation, where varying docking points hinder robot performance. By enabling policies to learn from diverse viewpoints efficiently, DockAnywhere significantly enhances the robustness and real-world deployment capability of mobile manipulation robots.

Original Abstract

Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address this issue, we propose a novel low-cost demonstration generation framework named DockAnywhere, which improves viewpoint generalization under docking variability by lifting a single demonstration to diverse feasible docking configurations. Specifically, DockAnywhere lifts a trajectory to any feasible docking points by decoupling docking-dependent base motions from contact-rich manipulation skills that remain invariant across viewpoints. Feasible docking proposals are sampled under feasibility constraints, and corresponding trajectories are generated via structure-preserving augmentation. Visual observations are synthesized in 3D space by representing the robot and objects as point clouds and applying point-level spatial editing to ensure the consistency of observation and action across viewpoints. Extensive experiments on ManiSkill and real-world platforms demonstrate that DockAnywhere substantially improves policy success rates and easily generalizes to novel viewpoints from unseen docking points during training, significantly enhancing the generalization capability of mobile manipulation policy in real-world deployment.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.