Explanations from Large Language Models Make Small Reasoners Better

October 13, 20222210.06726

Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang + 7 more

cs.CL

TLDR

This paper shows how explanations generated by large language models can be used to train smaller, more efficient models that achieve superior reasoning accuracy and generate high-quality explanations.

Key contributions

Proposes leveraging LLM-generated free-text explanations to improve training of small reasoning models via multi-task learning.
Demonstrates that small models trained with this method outperform traditional finetuning baselines and even surpass much larger GPT-3 models in accuracy.
Provides human evaluation confirming the small models produce high-quality, interpretable explanations alongside predictions.

Why it matters

This work is important because it enables the creation of compact, cost-effective reasoning models that retain strong performance and explainability, addressing practical deployment challenges while advancing explainable AI. By transferring reasoning capabilities from large models through explanations, it bridges the gap between powerful but expensive LLMs and efficient models suitable for real-world applications.

Original Abstract

Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically explore three explanation generation approaches from LLM and utilize a multi-task learning framework to facilitate small models to acquire strong reasoning power together with explanation generation capabilities. Experiments on multiple reasoning tasks show that our method can consistently and significantly outperform finetuning baselines across different settings, and even perform better than finetuning/prompting a 60x larger GPT-3 (175B) model by up to 9.5% in accuracy. As a side benefit, human evaluation further shows that our method can generate high-quality explanations to justify its predictions, moving towards the goal of explainable AI.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers