Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation
Zoe Fingleton, Nazanin Siavash, Armin Moin
TLDR
This paper uses RAG to enhance LLMs for automated software testing and code inspection, reducing hallucinations and improving efficiency.
Key contributions
- Automates software testing (test case generation) and code inspection using LLMs.
- Implements a Retrieval Augmented Generation (RAG) pipeline to mitigate LLM hallucinations.
- RAG integration positively impacts both automated test case generation and code inspection.
- Reduces project costs and saves human time by improving V&V activity efficiency.
Why it matters
This paper addresses the critical challenge of LLM hallucination in software V&V. By integrating RAG, it significantly improves the reliability and utility of LLMs for tasks like test generation and code inspection. This innovation promises substantial cost savings and efficiency gains in software development.
Original Abstract
In this paper, we focus on automating two of the widely used Verification and Validation (V&V) activities in the Software Development Lifecycle (SDLC): Software testing and software inspection (also known as review). Concerning the former, we concentrate on automated test case generation using Large Language Models (LLMs). For the latter, we enable inspection of the source code by LLMs. To address the known LLM hallucination problem, in which LLMs confidently produce incorrect outputs, we implement a Retrieval Augmented Generation (RAG) pipeline to integrate supplementary knowledge sources and provide additional context to the LLM. Our experimental results indicate that incorporating external context via the RAG pipeline has a generally positive impact on both test case generation and code inspection. This novel approach reduces the total project cost by saving human testers'/inspectors' time. It also improves the effectiveness and efficiency of these V&V activities, as evidenced by our experimental study.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.