Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

May 6, 20262605.05134

cs.LGmath.DS

TLDR

A new low-cost black-box method detects LLM hallucinations using dynamical system prediction and Koopman operator theory, outperforming prior methods.

Key contributions

Models LLM responses as a dynamical system using embedding sequences.
Applies Koopman operator theory to fit factual and hallucinated state transitions.
Detects hallucinations via a differential residual score from prediction errors.
Offers low-cost, single-sample detection without external knowledge or extra sampling.

Why it matters

This paper offers a novel, low-cost method for detecting LLM hallucinations, addressing the high computational expense of current techniques. By treating LLMs as dynamical systems, it avoids costly sampling or external knowledge. This efficient black-box approach makes reliable hallucination detection more practical for real-world applications.

Original Abstract

Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers