The First Token Knows: Single-Decode Confidence for Hallucination Detection

May 6, 20262605.05166

cs.CLcs.AI

TLDR

This paper introduces "phi_first," a single-decode method using first-token confidence to detect hallucinations, outperforming multi-sample self-consistency.

Key contributions

Proposes "phi_first," a low-cost hallucination detection method based on first-token confidence.
Achieves AUROC of 0.820, outperforming multi-sample self-consistency (0.791) and semantic self-consistency (0.793).
Uses a single greedy decode, avoiding costly multiple sampling and external NLI overhead.
Demonstrates that initial token distribution captures much of the uncertainty information.

Why it matters

Current hallucination detection methods are costly and complex, requiring multiple decodes. This paper offers a significantly simpler, more efficient, and equally effective approach. It suggests that models reveal their uncertainty very early, providing a new baseline for future research.

Original Abstract

Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers