Geometry-Calibrated Conformal Abstention for Language Models
Rui Xu, Yi Chen, Sihong Xie, Hui Xiong
TLDR
A post-hoc framework, Geometry-Calibrated Conformal Abstention, enables LMs to selectively abstain from answering when uncertain, boosting correctness.
Key contributions
- Proposes Conformal Abstention (CA), a post-hoc framework for LMs to selectively abstain from answering.
- CA offers finite-sample guarantees on both participation and response correctness probabilities.
- Employs prediction confidence, not intractable non-conformity scores, for abstention decisions.
- Introduces geometry-based calibration to align prediction confidence with model ignorance.
Why it matters
This paper tackles the critical problem of language models hallucinating instead of admitting ignorance. Its post-hoc abstention framework, with theoretical guarantees and geometric calibration, significantly boosts LM reliability, making them more trustworthy for sensitive applications.
Original Abstract
When language models lack relevant knowledge for a given query, they frequently generate plausible responses that can be hallucinations, rather than admitting being agnostic about the answer. Retraining models to reward admitting ignorance can lead to overly conservative behaviors and poor generalization due to scarce evaluation benchmarks. We propose a post hoc framework, Conformal Abstention (CA), adapted from conformal prediction (CP) to determine whether to abstain from answering a query. CA provides finite-sample guarantees on both the probability of participation (i.e., not abstaining) and the probability that the generated response is correct. Importantly, the abstention decision relies on prediction confidence rather than the non-conformity scores used in CP, which are intractable for open-ended generation. To better align prediction confidence with the model's ignorance, we introduce a calibration strategy using representation geometry within the model to measure knowledge involvement in shaping the response. Experiments demonstrate that we improve selective answering significantly with 75 percent conditional correctness.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.