ArXiv TLDR

Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics

🐦 Tweet
2604.17715

Khang Tran, Khoa Nguyen, Cristian Borcea, NhatHai Phan

cs.SEcs.LG

TLDR

GLMTest is a new LLM framework that uses program structure to generate targeted test cases, significantly improving branch accuracy for bug discovery.

Key contributions

  • Proposes GLMTest, the first program structure-aware LLM framework for targeted test case generation.
  • Integrates code property graphs and code semantics via GNNs and LLMs for branch-specific conditioning.
  • Enables controllable and branch-targeted test case generation to enhance bug and security risk discovery.
  • Achieves 50.2% branch accuracy, a significant improvement over state-of-the-art LLMs (27.4%).

Why it matters

Current LLMs struggle to target specific high-risk code branches for testing. GLMTest addresses this by leveraging program structure, leading to more effective discovery of subtle bugs and security vulnerabilities. This advancement is crucial for improving software reliability and security.

Original Abstract

Recent advances in large language models for test case generation have improved branch coverage via prompt-engineered mutations. However, they still lack principled mechanisms for steering models toward specific high-risk execution branches, limiting their effectiveness for discovering subtle bugs and security vulnerabilities. We propose GLMTest, the first program structure-aware LLM framework for targeted test case generation that seamlessly integrates code property graphs and code semantics using a graph neural network and a language model to condition test case generation on execution branches. This structured conditioning enables controllable and branch-targeted test case generation, thereby potentially enhancing bug and security risk discovery. Experiments on real-world projects show that GLMTest built on a Qwen2.5-Coder-7B-Instruct model improves branch accuracy from 27.4% to 50.2% on TestGenEval benchmark compared with state-of-the-art LLMs, i.e., Claude-Sonnet-4.5 and GPT-4o-mini.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.