Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi + 3 more
TLDR
Self-consistency is a new decoding strategy that improves chain-of-thought reasoning in language models by sampling diverse reasoning paths and selecting the most consistent answer.
Key contributions
- Introduces self-consistency decoding to replace greedy decoding in chain-of-thought prompting.
- Samples multiple reasoning paths and marginalizes over them to find the most consistent answer.
- Demonstrates significant performance improvements across various arithmetic and commonsense reasoning benchmarks.
Why it matters
This paper matters because it addresses a key limitation in current chain-of-thought reasoning approaches by leveraging multiple reasoning trajectories rather than relying on a single greedy path. This leads to more robust and accurate reasoning in large language models, substantially advancing their ability to solve complex tasks that require multi-step logical inference.
Original Abstract
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.