ArXiv TLDR

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

🐦 Tweet
2604.15109

Haozhi Fan, Jinhao Duan, Kaidi Xu

cs.CLcs.AIcs.LG

TLDR

IUQ is a new framework that quantifies uncertainty in long-form LLM generations by assessing inter-sample consistency and intra-sample faithfulness.

Key contributions

  • Addresses the challenge of uncertainty quantification in long-form, free-form LLM text generation.
  • Introduces IUQ, a novel framework leveraging inter-sample consistency and intra-sample faithfulness.
  • Utilizes an 'interrogate-then-respond' paradigm to provide reliable claim-level uncertainty measures.
  • Demonstrates superior performance across diverse LLM families and sizes on long-form datasets.

Why it matters

LLMs often generate coherent but factually inaccurate long-form text, posing a major challenge for real-world applications. IUQ provides a crucial step towards making these outputs more trustworthy by quantifying uncertainty at a claim level. This enhances the reliability of LLMs in critical long-form generation tasks.

Original Abstract

Despite the rapid advancement of Large Language Models (LLMs), uncertainty quantification in LLM generation is a persistent challenge. Although recent approaches have achieved strong performance by restricting LLMs to produce short or constrained answer sets, many real-world applications require long-form and free-form text generation. A key difficulty in this setting is that LLMs often produce responses that are semantically coherent yet factually inaccurate, while the underlying semantics are multifaceted and the linguistic structure is complex. To tackle this challenge, this paper introduces Interrogative Uncertainty Quantification (IUQ), a novel framework that leverages inter-sample consistency and intra-sample faithfulness to quantify the uncertainty in long-form LLM outputs. By utilizing an interrogate-then-respond paradigm, our method provides reliable measures of claim-level uncertainty and the model's faithfulness. Experimental results across diverse model families and model sizes demonstrate the superior performance of IUQ over two widely used long-form generation datasets. The code is available at https://github.com/louisfanhz/IUQ.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.