Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli + 3 more
TLDR
Toolformer enables language models to autonomously learn to use external tools via APIs, significantly enhancing their performance on diverse tasks without extra supervision.
Key contributions
- Introduces a self-supervised method for LMs to decide when and how to call external tool APIs.
- Demonstrates integration with multiple tools like calculators, search engines, Q&A systems, translators, and calendars.
- Achieves strong zero-shot task performance competitive with larger models while maintaining core language modeling capabilities.
Why it matters
This paper addresses a key limitation of large language models—their struggle with basic functional tasks—by enabling them to augment their capabilities through tool use learned autonomously. This approach not only improves task performance but also offers a scalable way to combine the flexibility of LMs with the precision of specialized tools, paving the way for more versatile and efficient AI systems.
Original Abstract
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.