Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

May 8, 20262605.07984

cs.LGcs.AI

TLDR

This paper investigates how language models plan for future tokens, finding that only Gemma-3-27B causally relies on planning at the line boundary.

Key contributions

Used rhyming-couplet completion to test planning site formation in Qwen3, Gemma-3, and Llama-3.
Linear probing showed future-rhyme information is decodable at the line boundary, strengthening with scale.
Activation patching showed only Gemma-3-27B causally relies on planning, with a causal handoff at layer 30.
Localized Gemma-3-27B's planning handoff to five attention heads, recovering ~90% of rhyme-routing capacity.

Why it matters

This paper provides mechanistic insights into how LMs plan for future constraints. It highlights a key difference in planning strategies between models, showing that only Gemma-3-27B truly forms and uses a "plan" at the line boundary. Understanding these mechanisms is crucial for developing more robust and controllable generative models.

Original Abstract

We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite strong probe signal. We localize the Gemma-3-27B handoff to five attention heads through two-stage path patching that recover ~90% of the rhyme-routing capacity at the newline.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers