S2G-RAG: Structured Sufficiency and Gap Judging for Iterative Retrieval-Augmented QA

April 26, 20262604.23783

Minghan Li, Junjie Zou, Xinxuan Lv, Chao Zhang, Guodong Zhou

cs.IRcs.AI

TLDR

S2G-RAG improves multi-hop QA by using a judge to identify missing information and guide iterative retrieval, reducing noise and enhancing robustness.

Key contributions

S2G-Judge predicts evidence sufficiency and generates structured "gap items" for missing info.
Maps gap items to retrieval queries, enabling stable multi-turn retrieval trajectories.
Reduces noise accumulation by maintaining a compact, sentence-level Evidence Context.
Improves multi-hop QA performance and robustness on TriviaQA, HotpotQA, and 2WikiMultiHopQA.

Why it matters

RAG systems often struggle with complex multi-hop questions, leading to incomplete or noisy answers. S2G-RAG addresses this by intelligently guiding iterative retrieval. Its explicit judging mechanism and noise reduction make it a significant step towards more robust and accurate QA systems.

Original Abstract

Retrieval-Augmented Generation (RAG) grounds language models in external evidence, but multi-hop question answering remains difficult because iterative pipelines must control what to retrieve next and when the available evidence is adequate. In practice, systems may answer from incomplete evidence chains, or they may accumulate redundant or distractor-heavy text that interferes with later retrieval and reasoning. We propose S2G-RAG (Structured Sufficiency and Gap-judging RAG), an iterative framework with an explicit controller, S2G-Judge. At each turn, S2G-Judge predicts whether the current evidence memory supports answering and, if not, outputs structured gap items that describe the missing information. These gap items are then mapped into the next retrieval query, producing stable multi-turn retrieval trajectories. To reduce noise accumulation, S2G-RAG maintains a sentence-level Evidence Context by extracting a compact set of relevant sentences from retrieved documents. Experiments on TriviaQA, HotpotQA, and 2WikiMultiHopQA show that S2G-RAG improves multi-hop QA performance and robustness under multi-turn retrieval. Furthermore, S2G-RAG can be integrated into existing RAG pipelines as a lightweight component, without modifying the search engine or retraining the generator.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers