ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

April 20, 20262604.18362

Qingying Niu, Yuhao Wang, Ruiyang Ren, Bohui Fang, Wayne Xin Zhao

cs.CLcs.IR

TLDR

ArbGraph improves long-form RAG reliability by pre-generating evidence arbitration, resolving factual conflicts before text generation.

Key contributions

ArbGraph explicitly resolves factual conflicts in long-form RAG *before* text generation.
Decomposes retrieved documents into atomic claims and builds a conflict-aware evidence graph.
Uses an intensity-driven iterative arbitration to propagate credibility and suppress unreliable claims.
Significantly improves factual recall, reduces hallucinations, and lowers sensitivity to retrieval noise.

Why it matters

Long-form RAG often struggles with factual consistency due to conflicting retrieved evidence. ArbGraph offers a novel pre-generation solution by explicitly resolving these conflicts, leading to more reliable and less hallucinatory outputs. This is crucial for building trustworthy AI systems.

Original Abstract

Retrieval-augmented generation (RAG) remains unreliable in long-form settings, where retrieved evidence is noisy or contradictory, making it difficult for RAG pipelines to maintain factual consistency. Existing approaches focus on retrieval expansion or verification during generation, leaving conflict resolution entangled with generation. To address this limitation, we propose ArbGraph, a framework for pre-generation evidence arbitration in long-form RAG that explicitly resolves factual conflicts. ArbGraph decomposes retrieved documents into atomic claims and organizes them into a conflict-aware evidence graph with explicit support and contradiction relations. On top of this graph, we introduce an intensity-driven iterative arbitration mechanism that propagates credibility signals through evidence interactions, enabling the system to suppress unreliable and inconsistent claims before final generation. In this way, ArbGraph separates evidence validation from text generation and provides a coherent evidence foundation for downstream long-form generation. We evaluate ArbGraph on two widely used long-form RAG benchmarks, LongFact and RAGChecker, using multiple large language model backbones. Experimental results show that ArbGraph consistently improves factual recall and information density while reducing hallucinations and sensitivity to retrieval noise. Additional analyses show that these gains are evident under conflicting or ambiguous evidence, highlighting the effectiveness of evidence-level conflict resolution for improving the reliability of long-form RAG. The implementation is publicly available at https://github.com/1212Judy/ArbGraph.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers