ArXiv TLDR

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

🐦 Tweet
2604.12994

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu + 5 more

cs.CRcs.AI

TLDR

LogicEval introduces a framework and dataset to systematically evaluate automated repair techniques, including LLMs, for real-world logical vulnerabilities.

Key contributions

  • Introduces LogicEval, a systematic framework for evaluating logical vulnerability repair.
  • Presents LogicDS, the first dataset of 86 real-world logical vulnerabilities with CVEs.
  • Evaluates traditional and LLM-based repair, highlighting LLM challenges like prompt sensitivity.
  • Identifies key LLM repair failure causes: prompt sensitivity, context loss, and patch localization.

Why it matters

Logical vulnerabilities are a critical security problem that current automated repair struggles with. This paper provides the first systematic way to evaluate solutions, especially LLM-based ones. Its dataset and framework are crucial for advancing research in this difficult area, paving the way for more robust software security.

Original Abstract

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. This paper aims to systematically evaluate both traditional and LLM-based repair approaches for addressing real-world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, of 86 logical vulnerabilities with assigned CVEs reflecting tangible security impact. We also developed a systematic framework, LogicEval, to evaluate patches for logical vulnerabilities. Evaluations suggest that compilation and testing failures are primarily driven by prompt sensitivity, loss of code context, and difficulty in patch localization.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.