ArXiv TLDR

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

🐦 Tweet
2604.18564

Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu

cs.CV

TLDR

MultiWorld introduces a scalable framework for multi-agent, multi-view video world models, improving control and consistency.

Key contributions

  • Extends video world models to support complex multi-agent, multi-view environments.
  • Introduces Multi-Agent Condition Module for precise multi-agent controllability.
  • Utilizes a Global State Encoder to ensure coherent observations across different views.
  • Achieves high efficiency through parallel view synthesis and flexible scaling of agents/views.

Why it matters

This paper is important because it extends video world models to complex multi-agent, multi-view scenarios, which are common in real-world applications. It offers a scalable and efficient solution for simulating environments with multiple interacting entities. This advancement could significantly impact areas like robotics and game AI.

Original Abstract

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present \textbf{MultiWorld}, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in parallel for high efficiency. Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action-following ability, and multi-view consistency. Project page: https://multi-world.github.io/

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.