ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC

May 6, 20262605.04709

Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry

cs.LGcs.ROeess.SY

TLDR

ELVIS enables robust long-horizon visual MPC via Gaussian-mixture planning and an uncertainty-aware lambda-return, handling branching futures and model errors.

Key contributions

Employs Gaussian-mixture MPPI for long-horizon planning, handling branching futures and multi-modal distributions.
Stabilizes deep imagination with an uncertainty-aware lambda-return from an ensemble of latent critics.
Adaptively balances bootstrapping and look-ahead to limit compounding model errors during planning.
Achieves state-of-the-art results on DeepMind Control Suite and zero-shot real-world transfer.

Why it matters

Long-horizon visual control is challenging due to branching futures and compounding model errors. ELVIS offers a practical solution by addressing these issues, making model-based RL more reliable for complex visual tasks. Its real-world transfer capability highlights its robustness and potential for practical applications.

Original Abstract

A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers