Context Unrolling in Omni Models

April 23, 20262604.21921

Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He + 14 more

cs.CV

TLDR

Omni is a unified multimodal model that uses 'Context Unrolling' to reason across diverse data types, improving performance and generation.

Key contributions

Omni: A unified multimodal model trained on text, images, videos, 3D geometry, and hidden representations.
Introduces 'Context Unrolling' for explicit reasoning across diverse modal representations.
Aggregates complementary information, improving multimodal knowledge approximation and reasoning fidelity.

Why it matters

This paper introduces a novel approach to multimodal AI, enabling models to explicitly reason across diverse data types. The 'Context Unrolling' mechanism allows for more faithful integration of information, pushing the boundaries of unified multimodal understanding and generation.

Original Abstract

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers