LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng + 13 more
TLDR
LLaDA2.0-Uni is a unified discrete diffusion LLM that excels in multimodal understanding, generation, and editing, setting a new paradigm.
Key contributions
- Unified architecture with discrete tokenizer, MoE-dLLM backbone, and diffusion decoder.
- Uses SigLIP-VQ to discretize visual inputs, enabling masked diffusion for text and vision.
- Achieves high efficiency via prefix-aware optimizations and few-step decoder distillation.
- Matches specialized VLMs in understanding and performs strongly in image generation/editing.
Why it matters
LLaDA2.0-Uni offers a scalable, unified framework for multimodal AI, natively supporting interleaved generation and reasoning. It sets a promising paradigm for future foundation models by matching specialized VLMs while performing strong image generation.
Original Abstract
We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model enables block-level masked diffusion for both text and vision inputs within the backbone, while the decoder reconstructs visual tokens into high-fidelity images. Inference efficiency is enhanced beyond parallel decoding through prefix-aware optimizations in the backbone and few-step distillation in the decoder. Supported by carefully curated large-scale data and a tailored multi-stage training pipeline, LLaDA2.0-Uni matches specialized VLMs in multimodal understanding while delivering strong performance in image generation and editing. Its native support for interleaved generation and reasoning establishes a promising and scalable paradigm for next-generation unified foundation models. Codes and models are available at https://github.com/inclusionAI/LLaDA2.0-Uni.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.