ArXiv TLDR

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

🐦 Tweet
2306.02707

Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi + 1 more

cs.CLcs.LG

TLDR

Orca is a 13B parameter model that improves small model reasoning by progressively learning from GPT-4's complex explanation traces and step-by-step thought processes, achieving state-of-the-art zero-shot performance on challenging benchmarks.

Key contributions

  • Introduces Orca, a 13B model trained to imitate the reasoning process of large foundation models using rich explanation traces from GPT-4.
  • Utilizes large-scale, diverse imitation data with careful sampling to enable progressive learning guided by ChatGPT assistance.
  • Demonstrates significant performance gains over prior instruction-tuned models on complex reasoning benchmarks and professional exams in zero-shot settings.

Why it matters

This paper addresses key limitations in current small model training by shifting focus from shallow output imitation to learning detailed reasoning steps from advanced models like GPT-4. By leveraging complex explanation traces and progressive learning strategies, Orca significantly narrows the performance gap with larger models, highlighting a scalable approach to enhance reasoning capabilities in smaller, more accessible language models.

Original Abstract

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.