ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry
TLDR
ELVIS enables robust long-horizon visual MPC via Gaussian-mixture planning and an uncertainty-aware lambda-return, handling branching futures and model errors.
Key contributions
- Employs Gaussian-mixture MPPI for long-horizon planning, handling branching futures and multi-modal distributions.
- Stabilizes deep imagination with an uncertainty-aware lambda-return from an ensemble of latent critics.
- Adaptively balances bootstrapping and look-ahead to limit compounding model errors during planning.
- Achieves state-of-the-art results on DeepMind Control Suite and zero-shot real-world transfer.
Why it matters
Long-horizon visual control is challenging due to branching futures and compounding model errors. ELVIS offers a practical solution by addressing these issues, making model-based RL more reliable for complex visual tasks. Its real-world transfer capability highlights its robustness and potential for practical applications.
Original Abstract
A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.