Continuous Latent Diffusion Language Model
Hongcan Guo, Qinyu Zhao, Yian Zhao, Shen Nie, Rui Zhu + 6 more
TLDR
Cola DLM is a hierarchical latent diffusion language model that generates text by modeling global semantics in a continuous latent space, offering a flexible non-autoregressive approach.
Key contributions
- Introduces Cola DLM, a hierarchical latent diffusion model for non-autoregressive text generation.
- Utilizes a Text VAE for stable text-to-latent mapping and a block-causal DiT for global semantic prior modeling.
- Separates global semantic organization from local textual realization through latent prior transport.
- Demonstrates strong scaling behavior and generation quality, outperforming autoregressive baselines.
Why it matters
This paper introduces a novel non-autoregressive paradigm for language modeling, moving beyond fixed left-to-right generation. By leveraging continuous latent diffusion, it improves efficiency and global semantic understanding. This approach paves the way for unified modeling across discrete text and continuous modalities.
Original Abstract
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.