Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

May 5, 20262605.03956

Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng Yao, Na Meng

cs.CRcs.SE

TLDR

PoVSmith automates generating proof-of-vulnerability tests for software supply chain attacks using LLMs, significantly improving test quality and reducing manual effort.

Key contributions

Introduces PoVSmith, an agent-based approach for generating proof-of-vulnerability (PoV) tests for software.
Combines call path analysis, code context, and feedback to guide LLMs in test generation and refinement.
Achieves 96% accuracy in identifying vulnerable entry points and generates effective attack tests for 55% of cases.
Outperforms existing LLM-based methods by reducing human involvement and dramatically improving test quality.

Why it matters

This paper addresses the critical need for automated proof-of-vulnerability test generation, helping developers assess and mitigate software supply chain risks. PoVSmith's LLM-driven approach significantly enhances application security by efficiently identifying and demonstrating exploitable vulnerabilities, surpassing current methods.

Original Abstract

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers