ArXiv TLDR

MARD: A Multi-Agent Framework for Robust Android Malware Detection

🐦 Tweet
2604.25264

Xueying Zeng, Youquan Xian, Sihao Liu, Xudong Mou, Yanze Li + 2 more

cs.CRcs.SE

TLDR

MARD is a multi-agent LLM framework for robust Android malware detection, achieving high accuracy, interpretability, and cost-efficiency against concept drift.

Key contributions

  • MARD: A multi-agent LLM framework for robust Android malware detection, bridging LLM semantics with static analysis.
  • Employs LLMs to orchestrate decision-making and construct interpretable evidentiary chains via a ReAct paradigm.
  • Achieves 93.46% F1 score, outperforming baselines and showing robustness against concept drift for five years.
  • Reduces deep analysis cost to under $0.10 per APK, making advanced detection highly efficient.

Why it matters

MARD, a novel multi-agent LLM framework, bridges semantic understanding with static analysis to provide accurate, robust, and interpretable Android malware detection. This cost-effective approach addresses concept drift, making analysis more accessible and reliable.

Original Abstract

With the rapid evolution of Android applications, traditional machine learning-based detection models suffer from concept drift. Additionally, they are constrained by shallow features, lacking deep semantic understanding and interpretability of decisions. Although Large Language Models (LLMs) demonstrate remarkable semantic reasoning capabilities, directly processing massive raw code incurs prohibitive token overhead. Moreover, this approach fails to fully unleash the deep logical reasoning potential of LLMs within complex contexts. To address these limitations, we propose MARD, a multi-agent framework for robust Android malware detection. This framework effectively bridges the gap between the semantic understanding of LLMs and traditional static analysis. It treats underlying deterministic analysis engines as on-demand execution tools, while utilizing the LLM to orchestrate the entire decision-making process. By designing an autonomous multi-agent interaction mechanism based on the ReAct paradigm, MARD constructs a highly interpretable evidentiary chain for conviction. Furthermore, we radically reduce the total cost of conducting a deep analysis of a single complex APK to under $0.10. Evaluations demonstrate that, without any domain-specific fine-tuning, MARD achieves an F1 score of 93.46%. It not only outperforms continual learning baselines but also exhibits robustness against concept drift and strong cross-domain generalization capabilities in evaluations spanning up to five years.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.