ArXiv TLDR

LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation

🐦 Tweet
2604.19536

Xiangchen Wang, Weiye Zhu, Teng Wang, TianTian Geng, Zekai Zhang + 3 more

cs.RO

TLDR

LiveVLN enables smoother, continuous vision-language navigation by overlapping action execution with observation processing, reducing stop-and-go behavior.

Key contributions

  • Introduces LiveVLN, a training-free framework for continuous embodied navigation.
  • Overlaps action execution with observation processing, reducing idle waiting in VLM navigators.
  • Maintains continuous action availability during motion for smoother online execution.
  • Reduces real-world waiting time by up to 77.7% and episode time by 12.6-19.6%.

Why it matters

Current navigation systems suffer from stop-and-go behavior due to blocking sense-inference-execution loops. LiveVLN addresses this by enabling continuous action flow, significantly improving real-world deployment efficiency and user experience for VLM navigators.

Original Abstract

Recent navigation systems achieve strong benchmark results, yet real-world deployment often remains visibly stop-and-go. This bottleneck arises because the sense-inference-execution loop is still blocking: after each new observation, the controller must wait for sensing, transmission, and inference before motion can continue. Reducing action-generation cost alone therefore does not remove redundant waiting. To address this issue, we present LiveVLN, a training-free framework for more continuous embodied navigation by augmenting pretrained VLM navigators with multi-step action continuation. Instead of pausing for each full sense-and-inference round, LiveVLN overlaps execution with the processing of newly arrived observations, allowing refreshed future actions to be handed off before the current executable prefix is exhausted. This design keeps actions continuously available during motion, reducing idle waiting and enabling smoother online execution. The framework operates at runtime and can be integrated with compatible pretrained VLM navigators. Across R2R and RxR, LiveVLN preserves benchmark performance while reducing waiting time and improving action availability. In real-world deployments, it cuts average episode waiting time by up to $77.7\%$ and shortens wall-clock episode time by $12.6\%$ on StreamVLN and $19.6\%$ on NaVIDA, yielding more coherent execution during deployment. Code is available at https://github.com/NIneeeeeem/LiveVLN.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.