ArXiv TLDR

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

🐦 Tweet
2604.17948

Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan, Asini Subanya + 13 more

cs.CRcs.AIcs.MA

TLDR

RAVEN is an LLM-powered framework using RAG to automate comprehensive vulnerability analysis and report generation for memory corruption in code.

Key contributions

  • Introduces RAVEN, an LLM-RAG framework for automated vulnerability report synthesis.
  • Utilizes Explorer, RAG, Analyst, and Reporter agents for comprehensive analysis.
  • Generates reports following Google Project Zero Root Cause Analysis templates.
  • Includes an LLM Judge for evaluating report quality across multiple criteria.

Why it matters

This paper addresses the underexplored potential of LLMs in automated vulnerability report documentation. RAVEN provides a structured, agent-based approach to generate high-quality analysis, significantly streamlining the process for cybersecurity professionals. Its evaluation demonstrates a promising step towards more efficient vulnerability management.

Original Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.