Factual recall in linear associative memories: sharp asymptotics and mechanistic insights
Alessio Giorlandino, Sebastian Goldt, Antoine Maillard
TLDR
This paper precisely characterizes the factual storage capacity of linear associative memories using statistical physics, offering insights into optimal learning.
Key contributions
- Introduces a decoupled model to precisely characterize factual storage capacity in linear associative memories.
- Demonstrates the decoupled model's equivalence to the original, storing up to $p_c \log p_c / d^2 = 1/2$ associations.
- Generalizes the computation of storage capacity to linear two-layer architectures.
- Provides mechanistic insights, showing optimal solutions raise correct scores just above extreme-value thresholds.
Why it matters
Understanding the fundamental limits of factual recall in neural networks is crucial for developing more robust and efficient large language models. This work provides a sharp, mechanistic characterization of memory capacity in a minimal setting. It offers a baseline for future research into more complex neural architectures.
Original Abstract
Large language models demonstrate remarkable ability in factual recall, yet the fundamental limits of storing and retrieving input--output associations with neural networks remain unclear. We study these limits in a minimal setting: a linear associative memory that maps $p$ input embeddings in $\mathbb{R}^d$ to their corresponding~$d$-dimensional targets via a single layer, requiring each mapped input to be well separated from all other targets. Unlike in supervised classification, this strict separation induces~$p$ constraints per association and produces strong correlations between constraints that make a direct characterisation of the storage capacity difficult. Here, we provide a precise characterisation of this capacity in the following way. We first introduce a decoupled model in which each input has its own independent set of competing outputs, and provide numerical and analytical evidence that this decoupled model is equivalent to the original model in terms of storage capacity, spectra of the learnt weights, and storage mechanism. Using tools from statistical physics, we show that the decoupled model can store up to $p_c \log p_c / d^2 = 1 / 2$ associations, and generalise the computation of $p_c$ to linear two-layer architectures. Our analysis also gives mechanistic insight into how the optimal solution improves over a naïve Hebbian learning rule: rather than boosting input-output alignments with broad fluctuations, the optimal solution raises the correct scores just above the extreme-value threshold set by the competing outputs. These findings give a sharp statistical-physics characterisation of factual storage in linear networks and provide a baseline for understanding the memory capacity of more realistic neural architectures.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.