Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez
TLDR
Gorilla is a finetuned LLaMA-based model that outperforms GPT-4 in generating accurate API calls by integrating document retrieval to reduce hallucinations and adapt to evolving API documentation.
Key contributions
- Introduces Gorilla, a model surpassing GPT-4 in writing precise API calls.
- Combines LLM with a document retriever to handle dynamic API documentation and reduce hallucinations.
- Presents APIBench, a new benchmark dataset covering diverse APIs from HuggingFace, TorchHub, and TensorHub.
Why it matters
This paper addresses a critical limitation of current large language models—their unreliable and often incorrect use of APIs—by demonstrating that fine-tuning combined with retrieval-augmented generation can significantly improve accuracy and adaptability. This advancement enables LLMs to better leverage external tools, maintain up-to-date knowledge with changing APIs, and produce more trustworthy outputs, which is essential for practical deployment in real-world applications that depend on precise API interactions.
Original Abstract
Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.