Call-Chain-Aware LLM-Based Test Generation for Java Projects

April 23, 20262604.22046

Guancheng Wang, Qinghua Xu, Lionel C. Briand, Zhaoqiang Guo, Kui Liu

cs.SEcs.AI

TLDR

CAT is a novel LLM-based test generation approach for Java that uses call-chain and dependency contexts to create more effective unit tests.

Key contributions

Introduces CAT, a call-chain-aware LLM-based test generation approach for Java projects.
Incorporates call-chain and dependency contexts into LLM prompts via dedicated static analysis.
Systematically models caller-callee relationships, object constructors, and third-party dependencies.
Achieves 18-21% higher line/branch coverage than state-of-the-art on Defects4J and real-world projects.

Why it matters

Complex Java projects challenge LLM-based test generation due to intricate dependencies. This paper addresses this by explicitly integrating call-chain and dependency contexts, leading to significantly improved test coverage. It offers a more robust method for generating high-quality unit tests for real-world software.

Original Abstract

Large language models (LLMs) have recently shown strong potential for generating project-level unit tests. However, existing state-of-the-art approaches primarily rely on execution-path information to guide prompt construction, which is often insufficient for complex software systems with rich inter-class dependencies, deep call chains, and intricate object initialization requirements. In this paper, we present CAT, a novel call-chain-aware LLM-based test generation approach that explicitly incorporates call-chain and dependency contexts into prompts through dedicated static analysis. To construct executable, semantically valid test contexts, CAT systematically models caller--callee relationships, object constructors, and third-party dependencies, and supports iterative test fixing when generation failures occur. We evaluate CAT on the widely used Defects4J benchmark and on four real-world GitHub projects released after the LLM's cut-off date. The results show that, across projects in Defects4J, CAT improves line and branch coverage by 18.04% and 21.74%, respectively, over the state-of-the-art approach PANTA, while consistently achieving superior performance on post-cutoff real-world projects. An ablation study further demonstrates the importance of call-chain and dependency contexts in CAT.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers