Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data

April 29, 20262604.26841

Bao Pham, Mohammed J. Zaki, Luca Ambrogioni, Dmitry Krotov, Matteo Negri

cs.LGcs.AIcs.CL

TLDR

Language diffusion models act as associative memories, exhibiting a memorization-to-generalization transition detectable by conditional entropy.

Key contributions

Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave as Associative Memories with creative capabilities.
Basins of attraction in UDDMs form via conditional likelihood maximization, expanding beyond explicit energy functions.
Identifies a sharp memorization-to-generalization transition in UDDMs, driven by training dataset size.
Conditional entropy of predicted tokens reliably detects this transition, distinguishing memorization from generalization.

Why it matters

This paper redefines language diffusion models as associative memories, offering a novel perspective on their learning dynamics. It introduces conditional entropy as a practical, quantitative probe to assess when models memorize versus generalize, crucial for understanding and deploying robust generative AI.

Original Abstract

When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by showing that Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave as Associative Memories (AMs) $\textit{with emergent creative capabilities}$. The core idea of an AM is to reliably recover stored data points as $\textit{memories}$ by establishing distinct basins of attraction around them. Historically, models like Hopfield networks use an explicit energy function to guarantee these stable attractors. We broaden this perspective by leveraging the observation that energy is not strictly necessary, as basins of attraction can also be formed via conditional likelihood maximization. By evaluating token recovery of $\textit{training}$ and $\textit{test}$ examples, we identify in UDDMs a sharp memorization-to-generalization transition governed by the size of the training dataset: as it increases, basins around training examples shrink and basins around unseen test examples expand, until both later converge to the same level. Crucially, we can detect this transition using only the conditional entropy of predicted token sequences: memorization is characterized by vanishing conditional entropy, while in the generalization regime the conditional entropy of most tokens remains finite. Thus, conditional entropy offers a practical probe for the memorization-to-generalization transition in deployed models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers