Weaponizing the Commons: A Taxonomy and Detection Framework of Abuse on GitHub
Yuli Cheng, Xiaoyu Zhang, Jiongchi Yu, Shiqing Ma, Chao Shen + 1 more
TLDR
This paper introduces a taxonomy and a high-performance detection framework for various abuse behaviors on GitHub, enhancing software supply chain security.
Key contributions
- Systematically reviews and summarizes reported GitHub abuse behaviors.
- Curates a manually labeled dataset of 392 GitHub abuse instances for analysis.
- Proposes a comprehensive taxonomy of diverse GitHub abuse symptoms and root causes.
- Develops a unified detection framework with high performance (F1 > 89%) across all abuse categories.
Why it matters
GitHub's critical role in modern software supply chains makes its security paramount. This paper addresses the lack of systematic investigation into GitHub abuse by providing a foundational understanding and a practical detection tool. This work significantly advances software supply chain security.
Original Abstract
GitHub plays a critical role in modern software supply chains, making its security an important research concern. Existing studies have primarily focused on CI/CD automation, collaboration patterns, and community management, while abuse behaviors on GitHub have received little systematic investigation. In this paper, we systematically review and summarize reported GitHub abuse behaviors and conduct an empirical analysis of publicly available abuse cases, curating a manually labeled dataset of 392 GitHub instances. Based on this investigation, we propose a comprehensive taxonomy that characterizes their diverse symptoms and root causes from a software security perspective. Building on this taxonomy, we develop a unified detection framework capable of identifying all abuse categories across repositories and user accounts. Evaluated on the constructed dataset, the proposed framework achieves high performance across all categories (e.g., F1-score exceeding 89%). Collectively, this work advances the understanding of GitHub abuse behaviors and lays the groundwork for large-scale, systematic analysis of the GitHub platform to strengthen software supply chain security.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.