Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

April 20, 20262604.18567

cs.LGcs.AIcs.CL

TLDR

Latent Phase-Shift Rollback (LPSR) corrects LLM reasoning errors during inference by monitoring residual streams and steering the KV-cache.

Key contributions

LPSR corrects LLM reasoning errors mid-generation by monitoring residual stream "phase shifts."
It detects errors using a cosine-similarity + entropy dual gate and injects a pre-computed steering vector.
Achieves 44.0% on MATH-500 (8B model), outperforming standard AR (28.8%) and self-correction (19.8%).
Reveals that optimal layers for error detection and correction are distinct ("detection-correction dissociation").

Why it matters

Large language models frequently make unrecoverable reasoning errors. This paper introduces a novel, efficient inference-time correction method that significantly boosts performance on complex tasks like MATH-500, surpassing strong baselines and even larger models. It makes LLMs more reliable without fine-tuning.

Original Abstract

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each generation step, we monitor the residual stream at a critical layer lcrit, detect abrupt directional reversals (phase shifts) via a cosine-similarity $+$ entropy dual gate, and respond by rolling back the KV-cache and injecting a pre-computed steering vector. No fine-tuning, gradient computation, or additional forward passes are required. LPSR achieves $\mathbf{44.0\%}$ on MATH-500 with an 8B model versus $28.8\%$ for standard AR ($+15.2$ pp; McNemar $χ^2 = 66.96$, $p < 10^{-15}$). Critically, prompted self-correction, the most natural inference-time baseline, scores only $19.8\%$, below standard AR; LPSR exceeds it by $+24.2$ pp ($χ^2 = 89.4$, $p \approx 0$). LPSR also outperforms Best-of-16 ($+7.8$ pp) at $5.4\times$ lower token cost, and surpasses a standard 70B model ($35.2\%$) with $8.75\times$ fewer parameters at ${\sim}3\times$ the token budget. A 32-layer sweep reveals a novel \textbf{detection-correction dissociation}: error-detection AUC peaks at layer~14 ($0.718$) but task accuracy peaks at layer~16 ($44.0\%$ vs.\ $29.2\%$), demonstrating that optimal monitoring depth differs for detection and correction.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers