STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation
Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu + 3 more
TLDR
STARRY is a novel world model for robotic manipulation that aligns spatial-temporal prediction with action generation for improved task success.
Key contributions
- Proposes STARRY, a world-model-enhanced policy aligning spatial-temporal prediction with action generation.
- Jointly denoises future spatial-temporal latents and action sequences for robust action planning.
- Introduces Geometry-Aware Selective Attention Modulation for precise action-attention.
- Achieves high success rates (93%+) in simulation and significant real-world performance gains.
Why it matters
Robotic manipulation struggles with complex spatial-temporal interactions, limiting current policy effectiveness. STARRY's action-centric world modeling significantly improves robot success rates by better predicting and generating actions for demanding tasks. This advancement is crucial for developing more capable and reliable autonomous robots.
Original Abstract
Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction with action generation. STARRY jointly denoises future spatial-temporal latents and action sequences, and introduces Geometry-Aware Selective Attention Modulation to convert predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings. Real-world experiments further improve average success from 42.5% to 70.8% over $π_{0.5}$, demonstrating the effectiveness of action-centric spatial-temporal world modeling for spatial-temporally demanding robotic action generation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.