ArXiv TLDR

CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation

🐦 Tweet
2604.19400

Tobias Kiecker, Jan Arne Sparka, Martin Reuter, Albert Ziegler, Lars Grunske

cs.SE

TLDR

CASCADE uses LLMs to generate tests from documentation, precisely detecting code-documentation inconsistencies with a strong focus on reducing false positives.

Key contributions

  • Leverages LLMs to generate unit tests directly from natural-language documentation.
  • Minimizes false positives by cross-checking generated tests with LLM-generated code.
  • Reports inconsistencies only when existing code fails a test that documentation-derived code passes.
  • Discovered 13 new inconsistencies in real-world Java, C#, and Rust projects, with 10 fixed.

Why it matters

Maintaining consistency between code and documentation is critical but often overlooked, leading to bugs and confusion. Existing automated tools struggle with high false positives, hindering adoption. CASCADE provides a highly precise solution, making automated inconsistency detection practical and reliable for developers.

Original Abstract

Maintaining consistency between code and documentation is a crucial yet frequently overlooked aspect of software development. Even minor mismatches can confuse API users, introduce new bugs, and increase overall maintenance effort. This creates demand for automated solutions that can assist developers in identifying code-documentation inconsistencies. However, since automatic reports still require human confirmation, false positives carry serious consequences: wasting developer time and discouraging practical adoption. We introduce CASCADE (Consistency Analysis for Source Code And Documentation through Execution), a novel tool for detecting inconsistencies with a strong emphasis on reducing false positives. CASCADE leverages Large Language Models (LLMs) to generate unit tests directly from natural-language documentation. Since these tests are derived from the documentation, any failure during execution indicates a potential mismatch between the documented and actual behavior of the code. To minimize false positives, CASCADE also generates code from the documentation to cross-check the generated tests. By design, an inconsistency is reported only when two conditions are met: the existing code fails a test, while the code generated from the documentation passes the same test. We evaluated CASCADE on a novel dataset of 71 inconsistent and 814 consistent code-documentation pairs drawn from open-source Java projects. Further, we applied CASCADE to additional Java, C#, and Rust repositories, where we uncovered 13 previously unknown inconsistencies, of which 10 have subsequently been fixed, demonstrating both CASCADE's precision and its applicability to real-world codebases.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.