ArXiv TLDR

Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs

🐦 Tweet
2604.20211

He Yang Yuan, Xin Wang, Kundi Yao, An Ran Chen, Zishuo Ding + 1 more

cs.SEcs.AIcs.CR

TLDR

This paper defines logging security issues, creates a benchmark, and evaluates LLMs for detection and repair, finding LLMs struggle with reliable repairs.

Key contributions

  • Derived a comprehensive taxonomy of logging code security issues (4 categories, 10 patterns).
  • Constructed a benchmark dataset of 101 real-world logging security reports.
  • Evaluated LLMs' detection and repair capabilities for logging security issues using a new framework.
  • Found LLMs moderately effective at detection but significantly challenged in generating correct code repairs.

Why it matters

Insecure logging can expose sensitive data and enable attacks. This paper offers a systematic analysis and benchmark, providing crucial insights into LLMs' current capabilities and limitations for secure logging. It helps practitioners and researchers improve software security.

Original Abstract

Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring. However, insecure logging practices can inadvertently expose sensitive information or enable attacks such as log injection, posing serious threats to system security and privacy. Prior research has examined general defects in logging code, but systematic analysis of logging code security issues remains limited, particularly in leveraging LLMs for detection and repair. In this paper, we derive a comprehensive taxonomy of logging code security issues, encompassing four common issue categories and 10 corresponding patterns. We further construct a benchmark dataset with 101 real-world logging security issue reports that have been manually reviewed and annotated. We then propose an automated framework that incorporates various contextual knowledge to evaluate LLMs' capabilities in detecting and repairing logging security issues. Our experimental results reveal a notable disparity in performance: while LLMs are moderately effective at detecting security issues (e.g., the accuracy ranges from 12.9% to 52.5% on average), they face noticeable challenges in reliably generating correct code repairs. We also find that the issue description alone improves the LLMs' detection accuracy more than the security pattern explanation or a combination of both. Overall, our findings provide actionable insights for practitioners and highlight the potential and limitations of current LLMs for secure logging.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.