ArXiv TLDR

Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis

🐦 Tweet
2604.14121

Zipeng Ling, Shuliang Liu, Shenghong Fu, Yuehao Tang, Seonil Son + 2 more

cs.CL

TLDR

CRAFT mitigates LLM reasoning flaws by building a Reasoning Knowledge Graph from consensus traces, improving accuracy and trace quality.

Key contributions

  • Identifies two types of LLM reasoning flaws: Step Internal (logic errors) and Step-wise (over/underthinking).
  • Introduces CRAFT, a framework that builds a Reasoning Knowledge Graph (RKG) from consensus candidate traces.
  • Synthesizes high-quality reasoning traces through topological generation from the RKG.
  • Improves label-prediction accuracy by over 10% and consistently outperforms baselines on benchmarks.

Why it matters

LLM reasoning often suffers from complex flaws, making their outputs unreliable. CRAFT provides a novel, unified approach to build robust reasoning traces, significantly boosting accuracy and trace quality. This is crucial for deploying trustworthy AI systems.

Original Abstract

LLM reasoning traces suffer from complex flaws -- *Step Internal Flaws* (logical errors, hallucinations, etc.) and *Step-wise Flaws* (overthinking, underthinking), which vary by sample. A natural approach would be to provide ground-truth labels to guide LLMs' reasoning. Contrary to intuition, we show that this yields no improvement in reasoning ability. We then propose CRAFT, a unified framework that mitigates both types of Step flaws, which builds a Reasoning Knowledge Graph (RKG) based on the consensus parts of multiple candidate traces, and synthesizes a high-quality trace through topological generation. Our approach improves label-prediction accuracy by 10+% on average, and consistently outperforms all baselines across both logical and mathematical reasoning benchmarks. Further, detailed benchmark evaluation proves that our method also improves the quality of LLMs' reasoning traces in multiple dimensions.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.