ArXiv TLDR

Hallucination Inspector: A Fact-Checking Judge for API Migration

🐦 Tweet
2604.20202

Marcos Tileria, Santanu Kumar Dash, Profir-Petru Pârţachi, Earl T. Barr

cs.SE

TLDR

Hallucination Inspector is a static analysis tool that detects "Scaffolding Hallucination" in LLM-generated API migration code, outperforming standard metrics.

Key contributions

  • Identifies "Scaffolding Hallucination" where LLMs invent non-existent symbols in API migration code.
  • Proposes Hallucination Inspector, a static analysis tool for detecting these LLM-generated errors.
  • Verifies symbols extracted from code's AST against a knowledge base derived from API documentation.
  • Significantly reduces false positives in hallucination detection compared to standard metrics.

Why it matters

LLMs are increasingly used in software engineering tasks like API migration, but their "scaffolding hallucinations" lead to incorrect code. This paper introduces a crucial tool to automatically detect these phantom symbols, improving the reliability of LLM-generated code for critical development tasks.

Original Abstract

Large Language Models (LLMs) are increasingly deployed in automated software engineering for tasks such as API migration. While LLMs are able to identify migration patterns, they often make mistakes and fail to produce correct glue code to invoke the new API in place of the old one. We call this issue Scaffolding Hallucination, a failure mode where models generate incorrect calling contexts by inventing Phantom Symbols -- such as imaginary imports, constructors, and constants -- that do not exist in the API specification. In this paper, we show that standard metrics cannot be relied upon to detect these instances of hallucination. We propose Hallucination Inspector, a static analysis tool to detect Scaffolding Hallucination in LLM-generated code. Our approach includes a lightweight evaluation framework that verifies symbols extracted from the abstract syntax tree against a knowledge base derived directly from software documentation for the API. A preliminary evaluation on Android API migrations demonstrates that our approach successfully identifies hallucinations and significantly reduces false positives compared to standard metrics and probabilistic judges

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.