Disentangled Point Diffusion for Precise Object Placement
Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held
TLDR
TAX-DPD is a hierarchical disentangled point diffusion framework achieving state-of-the-art precision and generalization for robotic object placement.
Key contributions
- Introduces TAX-DPD, a hierarchical disentangled point diffusion framework for precise object placement.
- Uses a Dense GMM for global scene-level placement priors and disentangled point cloud diffusion for local configuration.
- Achieves state-of-the-art precision, multi-modal coverage, and generalization to novel object geometries.
- Outperforms SE(3)-diffusion in accuracy and extends to non-rigid objects, validated in real-world tasks.
Why it matters
This paper solves robotic manipulation's precision and generalization issues. TAX-DPD, a novel disentangled point diffusion framework, achieves state-of-the-art object placement accuracy. This is crucial for industrial and complex real-world tasks, outperforming prior methods.
Original Abstract
Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric approach frames the task as predicting the placement pose of the target object, providing a modular decomposition of the problem. Building on this goal-prediction paradigm, we propose TAX-DPD, a hierarchical, disentangled point diffusion framework that achieves state-of-the-art performance in placement precision, multi-modal coverage, and generalization to variations in object geometries and scene configurations. We model global scene-level placements through a novel feed-forward Dense Gaussian Mixture Model (GMM) that yields a spatially dense prior over global placements; we then model the local object-level configuration through a novel disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame, enabling precise local geometric reasoning. Interestingly, we demonstrate that our point cloud diffusion achieves substantially higher accuracy than a prior approach based on SE(3)-diffusion, even in the context of rigid object placement. We validate our approach across a suite of challenging tasks in simulation and in the real-world on high-precision industrial insertion tasks. Furthermore, we present results on a cloth-hanging task in simulation, indicating that our framework can further relax assumptions on object rigidity.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.