ArXiv TLDR

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

🐦 Tweet
2605.03971

Hao Mi, Qiang Sheng, Shaofei Wang, Beizhe Hu, Yifan Sun + 5 more

cs.CL

TLDR

LaaB improves LLM hallucination detection by logically bridging neural features and symbolic self-judgments through a novel meta-judgment process.

Key contributions

  • Proposes LaaB, a framework bridging neural features and symbolic judgments for hallucination detection.
  • Introduces a "meta-judgment" process to map symbolic labels back into the feature space.
  • Leverages logical consistency between response and meta-judgment labels for dual-view signal integration.
  • Achieves superior hallucination detection across 4 datasets and 4 LLMs compared to 8 baselines.

Why it matters

LLM hallucinations are a major reliability concern, and current detection methods are fragmented. This paper offers a holistic approach by integrating implicit neural uncertainty with explicit symbolic reasoning. This leads to more robust and accurate hallucination detection, enhancing LLM trustworthiness for real-world applications.

Original Abstract

Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verbalized prompts. However, these methods address only a single facet of the hallucination, focusing either on implicit neural uncertainty or explicit symbolic reasoning, thereby treating these inherently coupled behaviors in isolation and failing to exploit their interdependence for a holistic view. In this paper, we propose LaaB (Logical Consistency-as-a-Bridge), a framework that bridges neural features and symbolic judgments for hallucination detection. LaaB introduces a "meta-judgment" process to map symbolic labels back into the feature space. By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection. Extensive experiments on 4 public datasets, across 4 LLMs, against 8 baselines demonstrate the superiority of LaaB.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.