ArXiv TLDR

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

🐦 Tweet
2604.25711

Zeming Dong, Yuejun Guo, Qiang Hu, Yao Zhang, Maxime Cordy + 3 more

cs.SEcs.AI

TLDR

MultiVul improves software vulnerability detection by combining code and comments using a multimodal contrastive learning framework.

Key contributions

  • Proposes MultiVul, a multimodal contrastive framework for software vulnerability detection.
  • Aligns code and comment representations using dual similarity learning and consistency regularization.
  • Achieves up to 27.07% F1 improvement over prompting and 13.37% over code-only fine-tuning.
  • Maintains comparable inference efficiency across various LLMs and datasets.

Why it matters

This paper introduces a novel approach to software vulnerability detection by leveraging both code and comments, addressing a key limitation of current methods. By integrating multimodal information, MultiVul significantly boosts detection accuracy and generalization. This advancement is crucial for developing more robust and secure software systems.

Original Abstract

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. Experiments on widely adopted DiverseVul and Devign datasets across four large language models (LLMs) (i.e., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, and CodeLlama-7B) show that MultiVul achieves up to 27.07% F1 improvement over prompting-based methods and 13.37% over code-only Fine-Tuning, while maintaining comparable inference efficiency.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.