Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

October 17, 20232310.11511

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi

cs.CLcs.AIcs.LG

TLDR

Self-RAG is a novel framework that enables large language models to adaptively retrieve relevant information and self-reflect on their outputs, significantly improving factual accuracy and response quality.

Key contributions

Introduces Self-RAG, which trains a single LM to retrieve passages on-demand and generate reflection tokens for self-critique.
Enables controllable inference by allowing the LM to decide when and what to retrieve, enhancing versatility and reducing irrelevant information.
Demonstrates superior performance over state-of-the-art LLMs like ChatGPT and retrieval-augmented Llama2-chat across QA, reasoning, fact verification, and long-form generation tasks.

Why it matters

This paper addresses a critical limitation of large language models—their tendency to produce factually inaccurate outputs due to reliance on fixed parametric knowledge—by integrating adaptive retrieval with self-reflection. This approach not only improves factuality and citation accuracy but also enhances the model's ability to tailor its behavior dynamically, making it a significant advancement for building more reliable and versatile AI systems.

Original Abstract

Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers