ArXiv TLDR

VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection

🐦 Tweet
2604.26313

Chidera Biringa, Ajmal Abbas, Vishnu Selvaraj, Gokhan Kul

cs.CRcs.LG

TLDR

VulStyle is a multi-modal pre-training model that combines code, AST, and stylometry for improved vulnerability detection, achieving SOTA results.

Key contributions

  • Introduces VulStyle, a multi-modal model for vulnerability detection using code, non-terminal AST, and stylometry.
  • Leverages non-terminal AST nodes to reduce complexity while preserving semantic hierarchy.
  • Integrates code stylometry (CStyle) features as crucial auxiliary signals for risky practices.
  • Achieves state-of-the-art performance on key benchmarks like BigVul and VulDeePecker.

Why it matters

This paper matters because it introduces a novel multi-modal approach to vulnerability detection, addressing limitations of prior methods by incorporating stylistic cues. By achieving SOTA results, VulStyle offers a more robust and accurate tool for identifying software vulnerabilities. This could significantly enhance software security practices.

Original Abstract

We present VulStyle, a multi-modal software vulnerability detection model that jointly encodes function-level source code, non-terminal Abstract Syntax Tree (AST) structure, and code stylometry (CStyle) features. Prior work in code representation primarily leverages token-level models or full AST trees, often missing stylistic cues indicative of risky programming practices, or incurring high structural overhead. Our approach selects only non-terminal AST nodes, reducing input complexity while preserving semantic hierarchy, and integrates syntactic and lexical CStyle features as auxiliary vulnerability signals. VulStyle is pre-trained using masked language modeling on 4.9M functions across seven programming languages, and fine-tuned across five benchmark datasets: Devign, BigVul, DiverseVul, REVEAL, and VulDeePecker. VulStyle achieves state-of-the-art performance on BigVul and VulDeePecker, improving F1 by 4-48% over strong transformer baselines, and attains competitive or best-average performance across all benchmarks. We contribute an ablation study isolating the effect of CStyle and AST structure, error case analysis, and a threat model situating the detection task in attacker-realistic scenarios.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.