Long-Horizon Manipulation via Trace-Conditioned VLA Planning

April 23, 20262604.21924

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu + 5 more

cs.RO

TLDR

LoHo-Manip enables long-horizon robot manipulation by using a VLM to manage tasks and generate visual traces for a VLA executor, improving robustness.

Key contributions

Introduces LoHo-Manip, a modular framework for long-horizon robot manipulation using VLA policies.
A task-management VLM predicts progress-aware plans, including subtask sequences and visual traces.
Executor VLA conditions on visual traces, converting long-horizon tasks into repeated local control.
Receding-horizon planning enables automatic continuation and replanning without explicit recovery logic.

Why it matters

Long-horizon robot manipulation is difficult due to multi-step tasks and compounding errors. LoHo-Manip tackles this by decoupling task management from execution, offering a robust and adaptive solution. This significantly improves success, robustness, and generalization for complex real-world robot tasks.

Original Abstract

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially, predicting the remaining plan at each step yields an implicit closed loop: failed steps persist in subsequent outputs, and traces update accordingly, enabling automatic continuation and replanning without hand-crafted recovery logic or brittle visual-history buffers. Extensive experiments spanning embodied planning, long-horizon reasoning, trajectory prediction, and end-to-end manipulation in simulation and on a real Franka robot demonstrate strong gains in long-horizon success, robustness, and out-of-distribution generalization. Project page: https://www.liuisabella.com/LoHoManip

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers