Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

April 17, 20262604.16217

Yanli Wang, Peng Kuang, Xiaoyu Han, Kaidi Xu, Haohan Wang

cs.CLcs.AI

TLDR

Robust conformal prediction for LLMs uses internal Layer-Wise Information (LI) scores to improve reliability, especially under distribution shift.

Key contributions

Introduces Layer-Wise Information (LI) scores as novel nonconformity scores for LLM conformal prediction.
LI scores leverage internal representations, measuring entropy changes across model depth.
Achieves superior validity-efficiency trade-off compared to text-level baselines, especially under cross-domain shifts.
Enhances LLM reliability and uncertainty quantification by using internal states instead of brittle surface statistics.

Why it matters

Ensuring LLM reliability is crucial for sensitive applications. This paper tackles the brittleness of traditional uncertainty signals under distribution shift by proposing a novel approach using internal model representations. This significantly improves the robustness and validity of uncertainty estimates.

Original Abstract

Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers