Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

May 8, 20262605.08077

Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu + 3 more

cs.CL

TLDR

Conformal Path Reasoning (CPR) improves trustworthy Knowledge Graph Question Answering by providing reliable coverage guarantees with compact answer sets.

Key contributions

CPR applies query-level conformal calibration on path-level scores to generate reliable path prediction sets.
Introduces Residual Conformal Value Network (RCVNet) for learning discriminative path-level nonconformity scores.
RCVNet is trained using PUCT-guided exploration, enhancing score discriminability in KGQA.
Achieves 34% higher empirical coverage and 40% smaller prediction sets compared to baselines.

Why it matters

Knowledge Graph Question Answering (KGQA) needs reliable answers with statistical guarantees. Existing Conformal Prediction methods often fail to provide valid coverage or yield excessively large answer sets. CPR addresses these issues, making KGQA more trustworthy and practical for real-world applications.

Original Abstract

Knowledge Graph Question Answering (KGQA) has shown promise for grounded and interpretable reasoning, yet existing approaches often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior methods suffer from critical limitations in both calibration validity and score discriminability, resulting in violated coverage guarantees and excessively large prediction sets. To address these pitfalls, we propose Conformal Path Reasoning (CPR), a trustworthy KGQA framework with two key innovations. First, we perform query-level conformal calibration over path-level scores, preserving the exchangeability while generating path prediction sets. Second, we introduce the Residual Conformal Value Network (RCVNet), a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores. Experiments on benchmarks show that CPR significantly improves the Empirical Coverage Rate by 34% while reducing average prediction set size by 40% compared to conformal baselines. These results validate the efficacy of CPR in satisfying coverage guarantees with substantially more compact answer sets.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers