ArXiv TLDR

Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

🐦 Tweet
2604.21310

Pawan Acharya, Lan Zhang

cs.CRcs.AI

TLDR

This paper proposes a novel method for generating adversarial malware that evades detection while minimizing drift signals in non-stationary environments.

Key contributions

  • Investigates generating adversarial malware that evades detection and minimizes drift signals.
  • Proposes a novel method using similarity-constrained perturbations in the classifier's feature space.
  • Balances targeted misclassification with minimizing drift signals through an optimization objective.
  • Experiments show similarity constraints, particularly L2 regularization, reduce output drift.

Why it matters

This research addresses a critical vulnerability in deep learning malware detection systems operating in dynamic environments. By demonstrating how attackers can craft evasive samples that also hide from drift monitoring, it highlights a significant security gap. This work is crucial for developing more robust and adaptive malware detection systems.

Original Abstract

Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.