Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations
Yanli Wang, Peng Kuang, Xiaoyu Han, Kaidi Xu, Haohan Wang
TLDR
Robust conformal prediction for LLMs uses internal Layer-Wise Information (LI) scores to improve reliability, especially under distribution shift.
Key contributions
- Introduces Layer-Wise Information (LI) scores as novel nonconformity scores for LLM conformal prediction.
- LI scores leverage internal representations, measuring entropy changes across model depth.
- Achieves superior validity-efficiency trade-off compared to text-level baselines, especially under cross-domain shifts.
- Enhances LLM reliability and uncertainty quantification by using internal states instead of brittle surface statistics.
Why it matters
Ensuring LLM reliability is crucial for sensitive applications. This paper tackles the brittleness of traditional uncertainty signals under distribution shift by proposing a novel approach using internal model representations. This significantly improves the robustness and validity of uncertainty estimates.
Original Abstract
Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.