SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

May 12, 20262605.12247

Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis, Guoyi Fu, Jie Wang + 3 more

cs.RO

TLDR

SI-Diff uses a force-domain diffusion policy with mode-conditioning to learn both robotic search and high-precision insertion tasks in a single framework.

Key contributions

Learns robotic search and high-precision insertion within a single diffusion policy.
Introduces a mode-conditioning mechanism to handle distinct action patterns.
Achieves 2.5x greater misalignment tolerance (5mm vs 2mm) than state-of-the-art.
Shows strong zero-shot transferability to novel, unseen assembly shapes.

Why it matters

Current robotic assembly systems often separate search and high-precision insertion, increasing complexity. SI-Diff unifies these tasks into a single force-domain diffusion policy, simplifying intelligent assembly. This improves robustness, extends misalignment tolerance, and enables zero-shot transfer, making robotic manipulation more versatile.

Original Abstract

Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion, while also demonstrating strong zero-shot transferability to unseen shapes.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers