Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
Dominik Dahlem, Diego Maniloff, Mac Misiura
TLDR
This paper proves symmetric spectral diagnostics of self-attention are orientation-blind and introduces a new two-axis diagnostic for LLM hallucination.
Key contributions
- Proves symmetric spectral diagnostics of attention are orientation-blind, unable to detect information flow direction.
- Introduces asymmetry coefficient G as the unique control parameter for attention flow direction.
- Develops a two-axis diagnostic (φ for capacity, G for direction) to analyze attention failures in LLMs.
- Demonstrates uniform causal attention maintains a capacity floor, while window attention can pierce it.
Why it matters
Understanding attention failures is crucial for mitigating LLM hallucinations. This paper provides a novel, more complete diagnostic framework by addressing the limitations of existing symmetric methods, offering a path to better interpret and improve large language models.
Original Abstract
Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.