Large Language Models Can Self-Improve

October 20, 20222210.11610

Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang + 2 more

cs.CL

TLDR

This paper shows that large language models can improve their reasoning abilities by fine-tuning on their own high-confidence, self-generated answers without any labeled data.

Key contributions

Introduces a method for LLMs to self-improve using unlabeled data by generating rationale-augmented answers with Chain-of-Thought prompting and self-consistency.
Demonstrates significant performance gains on multiple reasoning benchmarks (GSM8K, DROP, OpenBookQA, ANLI-A3) without any ground truth labels.
Provides ablation studies highlighting the importance of fine-tuning on reasoning tasks for effective self-improvement.

Why it matters

This work matters because it reduces the dependency on costly labeled datasets for improving large language models, enabling them to autonomously enhance their reasoning capabilities. This approach opens new avenues for scalable, self-supervised model refinement, which can accelerate advancements in AI reasoning and understanding.

Original Abstract

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers