A Mechanistic Analysis of Looped Reasoning Language Models

April 13, 20262604.11791

Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville + 2 more

cs.LGcs.AI

TLDR

This paper mechanistically analyzes looped reasoning LLMs, showing layers converge to distinct fixed points and repeat inference stages.

Key contributions

Looped LLM layers converge to distinct fixed points, forming a consistent cyclic latent trajectory.
Attention-head behavior stabilizes as these fixed points are reached, leading to constant recurrence.
Recurrent blocks learn inference stages that mirror and repeat those of feedforward models.
Investigates how block size, input injection, and normalization impact fixed point stability.

Why it matters

Understanding the internal dynamics of looped reasoning LLMs is crucial for their development. This work provides mechanistic insights into how these models process information, offering practical guidance for designing more effective and efficient architectures.

Original Abstract

Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers