ArXiv TLDR

Cache-Related Smells in GitLab CI/CD: Comprehensive Catalog, Automated Detection, and Empirical Evidence

🐦 Tweet
2604.17890

Francesco Urdih, Theodoros Theodoropoulos, Uwe Zdun

cs.SE

TLDR

This paper catalogs 10 cache-related smells in GitLab CI/CD, proposes CROSSER for automated detection, and empirically shows their widespread presence.

Key contributions

  • Catalogs ten cache-related "smells" in GitLab CI/CD impacting performance & reliability.
  • Introduces CROSSER, a tool to automatically detect seven of these smells with 0.98 F1 score.
  • Empirical study on 228 projects shows widespread presence of cache smells (89% affected).
  • Highlights developer unawareness of higher-level caching functionalities.

Why it matters

Caching is vital for CI/CD performance, yet misconfigurations are common and often overlooked. This paper provides a comprehensive catalog and an automated detection tool, helping developers identify and fix these issues. It also reveals widespread cache smells and developer unawareness, underscoring the need for better caching practices.

Original Abstract

Continuous Integration and Deployment (CI/CD) facilitate rapid software delivery, making fast feedback and minimal downtime essential. While caching has been shown to be an effective technique for tackling pipeline performance and reliability issues, existing works have primarily focused on missing dependency caches, ignoring other types of caches and cache misconfigurations. In this paper, we present a comprehensive catalog of ten cache-related smells in GitLab CI/CD that negatively impact performance and reliability, validated on a corpus of grey literature. To address the smells, we propose CROSSER, a tool that automatically detects seven of the ten smells. We evaluate CROSSER on a manually labeled dataset of 82 mature projects, achieving an overall F1 score of 0.98. Finally, we investigate the presence of smells across a large dataset of 228 mature open-source projects and outline our empirical findings. Our results show a widespread frequency of the smells, as only 11% of the projects do not present any. We also show that developers may not be aware of higher-level caching functionalities.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.