When to Trust Imagination: Adaptive Action Execution for World Action Models

May 7, 20262605.06222

Rui Wang, Yue Zhang, Jiehong Lin, Kuncheng Luo, Jianan Wang + 2 more

cs.ROcs.AI

TLDR

This paper introduces an adaptive execution method for World Action Models (WAMs) that verifies future predictions against reality, improving robotic manipulation efficiency and robustness.

Key contributions

Formulates adaptive WAM execution as a future-reality verification problem.
Proposes Future Forward Dynamics Causal Attention (FFDC) to verify predicted actions and observations against reality.
FFDC enables adaptive action chunk sizes, balancing efficiency and responsiveness in robotic tasks.
Introduces Mixture-of-Horizon Training to enhance long-horizon trajectory coverage for adaptive execution.

Why it matters

This paper addresses a key limitation in robotic manipulation using World Action Models by enabling adaptive execution. By dynamically verifying predictions against reality, it significantly improves both efficiency and robustness. This approach could lead to more reliable and faster robotic systems in complex real-world scenarios.

Original Abstract

World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to whether the imagined future remains consistent with the actual physical rollout. In this work, we formulate adaptive WAM execution as a future-reality verification problem: the robot should execute longer when the WAM-predicted future remains reliable, and replan earlier when reality deviates from imagination. To this end, we propose Future Forward Dynamics Causal Attention (FFDC), a lightweight verifier that jointly reasons over predicted future actions, predicted visual dynamics, real observations, and language instructions to estimate whether the remaining action rollout can still be trusted. FFDC enables adaptive action chunk sizes as an emergent consequence of prediction-observation consistency, preserving the efficiency of long-horizon execution while restoring responsiveness in contact-rich or difficult phases. We further introduce Mixture-of-Horizon Training to improve long-horizon trajectory coverage for adaptive execution. Experiments on the RoboTwin benchmark and in the real world demonstrate that our method achieves a strong robustness-efficiency trade-off: on RoboTwin, it reduces WAM forward passes by 69.10% and execution time by 34.02%, while improving success rate by 2.54% over the short-chunk baseline; in real-world experiments, it improves success rate by 35%.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers