AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
Susheel Suresh, Hazel Mak, Shangpo Chou, Fred Kroon, Sahil Bhatnagar
TLDR
AgenticRAG improves enterprise RAG by using an LLM agent with tools for iterative retrieval and analysis, significantly boosting recall and factuality.
Key contributions
- Introduces AgenticRAG, an LLM agent harness for iterative retrieval and analysis over enterprise knowledge bases.
- Achieves 49.6% recall@1 on BRIGHT (+21.8 pp) and 0.96 factuality on WixQA (+13% relative).
- Demonstrates 92% answer correctness on FinanceBench, near oracle performance.
- Shows agentic tool use provides a 5.9x improvement over single-shot retrieval.
Why it matters
This paper addresses a key limitation in RAG by empowering LLMs with agentic capabilities to navigate and analyze enterprise data. Its significant performance gains across multiple benchmarks, especially in recall and factuality, make it highly relevant for real-world enterprise applications. The approach offers a practical way to enhance existing search infrastructures.
Original Abstract
We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fixed candidate set chosen deep in the retrieval process. Our approach reduces this overdependence by layering a lightweight harness on top of existing enterprise search infrastructure, equipping a reasoning LLM with search, find, open, and summarize tools enabling the model to iteratively retrieve information, navigate within documents, and analyze evidence autonomously. On three open benchmarks we observe substantial gains: $49.6\%$ recall@1 on BRIGHT (+21.8 pp over the best embedding baseline), 0.96 factuality on WixQA ($+13\%$ relative improvement), and $92\%$ answer correctness on FinanceBench--within 2 pp of oracle access to true evidence. Ablation studies show that the most significant factor is the shift from single-shot retrieval to agentic tool use ($5.9\times$ improvement), while multi-query search and in-document navigation contribute to both quality and efficiency. We present various design choices in our agentic harness that were informed by pre-production deployments. Our results demonstrate its suitability for real-world enterprise production environments.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.