ArXiv TLDR

eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem

🐦 Tweet
2604.26219

Sk Tanzir Mehedi, Raja Jurdak, Chadni Islam, Abu Bakar Siddique Mahi, Gowri Ramachandran

cs.CRcs.LG

TLDR

eDySec is a deep learning framework for dynamic analysis that efficiently and explainably detects malicious packages in the PyPI ecosystem.

Key contributions

  • Introduces eDySec, a DL-based framework for dynamic analysis to detect malicious PyPI packages.
  • Integrates explainable AI and stability analysis for transparent and reliable model decisions.
  • Halves feature dimensionality, reducing false positives by 82% and false negatives by 79%.
  • Achieves 3% higher accuracy, near-perfect stability, and 170ms inference latency per package.

Why it matters

Traditional ML struggles with complex, high-dimensional dynamic data from next-gen supply chain attacks. eDySec offers an efficient, stable, and explainable deep learning solution, significantly improving detection performance and transparency. This advances the fight against evolving software supply chain threats.

Original Abstract

The security of open-source software repositories is increasingly threatened by next-gen software supply chain attacks. These attacks include multiphase malware execution, remote access activation, and dynamic payload generation. Traditional Machine Learning (ML) detectors struggle to detect these attacks due to the high-dimensional and sparse nature of dynamic behavioral data, including system calls, network traffic, directory access patterns, and dependency logs. As a result, these data characteristics degrade the performance, stability, and explainability of ML models. These challenges have made Deep Learning (DL) a promising alternative, given its success across various domains and its potential for modeling complex patterns. This paper presents eDySec, a DL-based efficient, stable, and explainable framework for dynamic behavioral analysis to detect malicious packages. Using the QUT-DV25 dataset, which captures both install-time and post-installation behaviors of packages, we evaluate DL models and investigate feature sets to identify the most discriminative attributes for enabling efficient malicious package detection. Additionally, model stability analysis and explainable AI techniques are incorporated into the detection pipeline to enable stable, and transparent interpretations of model decisions. Experimental results demonstrate that eDySec significantly outperforms the state-of-the-art frameworks. Specifically, it halves feature dimensionality while lowering false positives by 82% and false negatives by 79%. It also improves accuracy by 3%, achieves near-perfect stability, and maintains an inference latency of 170ms per package. Further analysis reveals that feature and model selection play a critical role, as certain combinations degrade performance. Ultimately, this study advances the understanding of the strengths and limitations of dynamic analysis against next-gen attacks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.