ArXiv TLDR

R3D: Revisiting 3D Policy Learning

🐦 Tweet
2604.15281

Zhengdong Hong, Shenrui Wu, Haozhe Cui, Boyi Zhao, Ran Ji + 6 more

cs.CVcs.RO

TLDR

R3D introduces a stable 3D policy learning architecture with a transformer encoder and diffusion decoder, overcoming overfitting and instability.

Key contributions

  • Diagnosed key issues in 3D policy learning: training instability and severe overfitting.
  • Identified root causes: missing 3D data augmentation and adverse Batch Normalization effects.
  • Introduced R3D, a stable architecture with a transformer 3D encoder and diffusion decoder.
  • Significantly outperforms state-of-the-art on manipulation benchmarks, enabling scalable 3D imitation.

Why it matters

This paper provides a robust solution to long-standing issues in 3D policy learning, such as instability and overfitting. By enabling scalable and stable training, it paves the way for more powerful 3D perception models and better generalization in robotics. This is crucial for advancing real-world AI applications.

Original Abstract

3D policy learning promises superior generalization and cross-embodiment transfer, but progress has been hindered by training instabilities and severe overfitting, precluding the adoption of powerful 3D perception models. In this work, we systematically diagnose these failures, identifying the omission of 3D data augmentation and the adverse effects of Batch Normalization as primary causes. We propose a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder, engineered specifically for stability at scale and designed to leverage large-scale pre-training. Our approach significantly outperforms state-of-the-art 3D baselines on challenging manipulation benchmarks, establishing a new and robust foundation for scalable 3D imitation learning. Project Page: https://r3d-policy.github.io/

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.