ArXiv TLDR

Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

🐦 Tweet
2604.09537

Soroosh Tayebi Arasteh, Mehdi Joodaki, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

cs.CLcs.AIcs.IRcs.LG

TLDR

Introduces Case-Grounded Evidence Verification, a framework with a novel supervision method to train models that genuinely depend on evidence for claims.

Key contributions

  • Introduces Case-Grounded Evidence Verification framework for evidence-sensitive supervision.
  • Novel supervision procedure generates explicit support and controlled non-support examples without manual annotation.
  • Learned verifier substantially outperforms baselines, demonstrating genuine evidence dependence and transferability.

Why it matters

Current evidence-grounded reasoning struggles with weak supervision and models not genuinely depending on evidence. This paper tackles this by providing a framework and method to construct supervision that ensures true evidence dependence. It highlights supervision quality as a major bottleneck, improving reliability for critical applications.

Original Abstract

Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provided evidence supports the target claim. In practice, this often fails because supervision is weak, evidence is only loosely tied to the claim, and evaluation does not test evidence dependence directly. We introduce case-grounded evidence verification, a general framework in which a model receives a local case context, external evidence, and a structured claim, and must decide whether the evidence supports the claim for that case. Our key contribution is a supervision construction procedure that generates explicit support examples together with semantically controlled non-support examples, including counterfactual wrong-state and topic-related negatives, without manual evidence annotation. We instantiate the framework in radiology and train a standard verifier on the resulting support task. The learned verifier substantially outperforms both case-only and evidence-only baselines, remains strong under correct evidence, and collapses when evidence is removed or swapped, indicating genuine evidence dependence. This behavior transfers across unseen evidence articles and an external case distribution, though performance degrades under evidence-source shift and remains sensitive to backbone choice. Overall, the results suggest that a major bottleneck in evidence grounding is not only model capacity, but the lack of supervision that encodes the causal role of evidence.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.