Where Reasoning Breaks: Logic-Aware Path Selection by Controlling Logical Connectives in LLMs Reasoning Chains

April 22, 20262604.20564

cs.CL

TLDR

This paper introduces a framework to improve LLM multi-step reasoning by intervening at logical connectives, identified as key fragility points.

Key contributions

Identifies logical connectives as key fragility points in LLM reasoning, acting as high-entropy "forking points."
Proposes a multi-layered framework including Gradient-based Logical Steering and Localized Branching for intervention.
Uses Targeted Transition Preference Optimization to surgically optimize single-token preferences at logical pivots for efficiency.

Why it matters

LLMs often fail in complex logical reasoning due to error propagation. This work offers a targeted approach to improve their reliability by intervening precisely where logical decisions are made. By focusing on critical connective points, it enhances reasoning accuracy without the high computational cost of global methods.

Original Abstract

While LLMs demonstrate impressive reasoning capabilities, they remain fragile in multi-step logical deduction, where a single transition error can propagate through the entire reasoning chain, leading to unstable performance. In this work, we identify logical connectives as primary points of this structural fragility. Through empirical analysis, we show that connective tokens function as high entropy forking points, at which models frequently struggle to determine the correct logical direction. Motivated by this observation, we hypothesize that intervening in logical connective selection can guide LLMs toward more correct logical direction, thereby improving the overall reasoning chain. To validate this hypothesis, we propose a multi-layered framework that intervenes specifically at these logic-critical junctions in the reasoning process. Our framework includes (1) Gradient-based Logical Steering to guide LLMs internal representations towards valid reasoning subspaces, (2) Localized Branching to resolve ambiguity via targeted look-ahead search, and (3) Targeted Transition Preference Optimization, a surgical reinforcement learning objective that selectively optimizes single-token preferences at logical pivots. Crucially, by concentrating intervention solely on logic-critical transitions, our framework achieves a favorable accuracy--efficiency trade-off compared to global inference time scaling methods like beam search and self-consistency.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers