ArXiv TLDR

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

🐦 Tweet
2201.11903

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter + 4 more

cs.CLcs.AI

TLDR

Chain of thought prompting, which involves providing intermediate reasoning steps in prompts, significantly enhances large language models' performance on complex reasoning tasks.

Key contributions

  • Introduces chain of thought prompting to elicit step-by-step reasoning in large language models.
  • Demonstrates substantial performance improvements on arithmetic, commonsense, and symbolic reasoning benchmarks.
  • Achieves state-of-the-art results on GSM8K math problems using a 540B-parameter model with minimal exemplars.

Why it matters

This paper matters because it reveals a simple yet powerful prompting technique that unlocks emergent reasoning abilities in large language models without additional training, enabling them to solve complex problems more accurately and advancing the capabilities of AI in tasks requiring multi-step reasoning.

Original Abstract

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.