Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study

April 20, 20262604.17896

cs.LGcs.AIcs.RO

TLDR

This paper shows that explicitly supervising physical feasibility improves VLA robot policies, enhancing reliability and performance, especially with limited data.

Key contributions

Identifies a gap in VLA training: lack of explicit physical constraint supervision.
Proposes a simple, geometry-grounded feasibility objective for VLA policies.
Integrates this objective into diffusion-based VLA model training.
Demonstrates improved reliability, performance, and data efficiency in VLA policies.

Why it matters

This research highlights a crucial missing piece in VLA training: explicit physical feasibility. By showing that direct supervision improves reliability and performance, it paves the way for more robust and trustworthy robot learning systems.

Original Abstract

Vision-Language-Action (VLA) models map multimodal inputs directly to robot actions and are typically trained through large-scale imitation learning. While this paradigm has shown strong performance, prevailing VLA training procedures do not explicitly supervise hard physical constraints such as obstacle avoidance or kinematic feasibility. As a result, the geometric structure underlying physically feasible behavior must be inferred only implicitly from demonstrations. In this paper, we study whether introducing explicit feasibility supervision can provide effective structured guidance for VLA policies. We formulate a simple geometry-grounded feasibility objective and integrate it into the training stage of a diffusion-based VLA policy. To evaluate this idea systematically, we use obstacle-aware manipulation as a controlled probe of geometry-dependent physical feasibility. Empirical results show that augmenting VLA training with feasibility supervision improves both physical reliability and overall task performance, while also enhancing learning efficiency in the low-data regime. These findings indicate that explicit feasibility signals can effectively complement imitation-based VLA learning, highlighting their potential for developing more reliable VLA policies.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers