AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

May 7, 20262605.05538

Susheel Suresh, Hazel Mak, Shangpo Chou, Fred Kroon, Sahil Bhatnagar

cs.AIcs.IR

TLDR

AgenticRAG improves enterprise RAG by using an LLM agent with tools for iterative retrieval and analysis, significantly boosting recall and factuality.

Key contributions

Introduces AgenticRAG, an LLM agent harness for iterative retrieval and analysis over enterprise knowledge bases.
Achieves 49.6% recall@1 on BRIGHT (+21.8 pp) and 0.96 factuality on WixQA (+13% relative).
Demonstrates 92% answer correctness on FinanceBench, near oracle performance.
Shows agentic tool use provides a 5.9x improvement over single-shot retrieval.

Why it matters

This paper addresses a key limitation in RAG by empowering LLMs with agentic capabilities to navigate and analyze enterprise data. Its significant performance gains across multiple benchmarks, especially in recall and factuality, make it highly relevant for real-world enterprise applications. The approach offers a practical way to enhance existing search infrastructures.

Original Abstract

We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fixed candidate set chosen deep in the retrieval process. Our approach reduces this overdependence by layering a lightweight harness on top of existing enterprise search infrastructure, equipping a reasoning LLM with search, find, open, and summarize tools enabling the model to iteratively retrieve information, navigate within documents, and analyze evidence autonomously. On three open benchmarks we observe substantial gains: $49.6\%$ recall@1 on BRIGHT (+21.8 pp over the best embedding baseline), 0.96 factuality on WixQA ($+13\%$ relative improvement), and $92\%$ answer correctness on FinanceBench--within 2 pp of oracle access to true evidence. Ablation studies show that the most significant factor is the shift from single-shot retrieval to agentic tool use ($5.9\times$ improvement), while multi-query search and in-document navigation contribute to both quality and efficiency. We present various design choices in our agentic harness that were informed by pre-production deployments. Our results demonstrate its suitability for real-world enterprise production environments.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers