ArXiv TLDR

Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

🐦 Tweet
2604.22629

Tomáš Kalný, Martin Jureček, Mark Stamp

cs.CRcs.LG

TLDR

This paper detects concept drift in evolving malware families by comparing rule-based classifier representations across temporal windows.

Key contributions

  • Proposes a structural approach for concept drift detection in malware using decision tree rulesets.
  • Quantifies drift by comparing rule representations using feature importance, prediction agreement, and stability metrics.
  • Evaluates approach on EMBER2024 across six malware families, comparing fixed-interval and clustering windowing.
  • Identifies fixed two-month windowing with feature-level Pearson correlation as most reliable for drift detection.

Why it matters

Malware evolves rapidly, causing concept drift that degrades classifier performance. This work offers a novel structural method to detect such drift, crucial for maintaining effective malware detection systems. Understanding drift helps adapt models proactively.

Original Abstract

This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.