Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

May 2, 20262605.01399

Sangkwon Park, Donghun Kang, Jisoo Mok, Sungroh Yoon

cs.CLcs.AIcs.IR

TLDR

Verbal-R3 introduces a novel RAG framework using 'Verbal Annotations' and a Verbal Reranker to improve LLM reasoning and achieve SOTA on QA benchmarks.

Key contributions

Proposes Verbal Annotations to explicitly link queries and retrieved contexts, improving LLM integration.
Introduces Verbal-R3, an agentic RAG with a Generator and a Verbal Reranker for guided reasoning.
Verbal Reranker provides relevance scores and annotations to steer the LLM's generation process.
Achieves state-of-the-art performance on complex Question Answering benchmarks.

Why it matters

Conventional RAG often struggles with integrating retrieved information. This paper introduces 'Verbal Annotations' to explicitly guide LLM reasoning, leading to more accurate, contextually-grounded responses. This significantly advances RAG's ability to leverage external knowledge.

Original Abstract

The conventional Retrieval-Augmented Generation (RAG) paradigm of injecting raw retrieved texts into the Large Language Model (LLM)'s context often results in suboptimal integration of retrieved information. This paper proposes to bridge retrieval results and the LLM's reasoning ability through Verbal Annotations, analytic narratives that explicitly articulate the logical connection between a search query and retrieved contexts. Our empirical investigation reveals the potential of Verbal Annotations to substantially enhance the LLM's ability to generate accurate, contextually-grounded responses. Motivated by this finding, we introduce Verbal-R3, a novel agentic RAG framework that consists of a Generator and a Verbal Reranker. The Generator performs iterative retrieval and reasoning, while the Verbal Reranker returns relevance scores and Verbal Annotations to guide the reasoning and answering process of the Generator. The inference process of Verbal-R3 is further refined through relevance-guided test-time scaling, which efficiently allocates test-time compute for effective trajectory expansion. Verbal-R3 achieves state-of-the-art performance on complex Question Answering benchmarks, validating the effectiveness of the proposed framework.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers