Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly

April 22, 20262604.20712

Yongqiang Zhao, Xuyang Zhang, Zhuo Chen, Matteo Leonetti, Emmanouil Spyrakos-Papastavridis + 1 more

cs.RO

TLDR

A novel visual-tactile framework learns robotic peg-in-hole assembly by leveraging easier peg-out-of-hole disassembly data, improving success rates and reducing contact forces.

Key contributions

Learns peg-in-hole assembly by reversing and randomizing peg-out-of-hole disassembly trajectories.
Employs a unified visual-tactile policy, using vision for approach and touch for alignment.
Achieves 87.5% success on seen and 77.1% on unseen objects, outperforming direct RL.
Reduces contact forces by 6.4% compared to single-modality approaches.

Why it matters

Robotic peg-in-hole assembly is a critical but difficult task. This paper introduces an efficient learning framework that significantly improves success rates and reduces contact forces by leveraging easier disassembly data. This approach makes robotic manipulation more robust and practical for complex assembly tasks.

Original Abstract

Peg-in-hole (PiH) assembly is a fundamental yet challenging robotic manipulation task. While reinforcement learning (RL) has shown promise in tackling such tasks, it requires extensive exploration. In this paper, we propose a novel visual-tactile skill learning framework for the PiH task that leverages its inverse task, i.e., peg-out-of-hole (PooH) disassembly, to facilitate PiH learning. Compared to PiH, PooH is inherently easier as it only needs to overcome existing friction without precise alignment, making data collection more efficient. To this end, we formulate both PooH and PiH as Partially Observable Markov Decision Processes (POMDPs) in a unified environment with shared visual-tactile observation space. A visual-tactile PooH policy is first trained; its trajectories, containing kinematic, visual and tactile information, are temporally reversed and action-randomized to provide expert data for PiH. In the policy learning, visual sensing facilitates the peg-hole approach, while tactile measurements compensate for peg-hole misalignment. Experiments across diverse peg-hole geometries show that the visual-tactile policy attains 6.4% lower contact forces than its single-modality counterparts, and that our framework achieves average success rates of 87.5% on seen objects and 77.1% on unseen objects, outperforming direct RL methods that train PiH policies from scratch by 18.1% in success rate. Demos, code, and datasets are available at https://sites.google.com/view/pooh2pih.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers