Unsafe and Unused? A History of Utility Code in Mature Open Source Projects
Brandon Keller, Kaitlin Yandik, Angela Ngo, Andy Meneely
TLDR
This study reveals that 'util' files in mature open source projects are significantly more prone to vulnerabilities and exhibit complex usage patterns.
Key contributions
- Conducted a longitudinal study across 7 mature open source projects over 147 project-years.
- Analyzed util file usage, complexity, developer collaboration, and security correlations.
- Discovered util files are up to 2.75 times more prone to vulnerabilities than other files.
- Tracked util files' entire lifetime in codebases using 30-day snapshot rename tracking.
Why it matters
This research helps developers understand the risks associated with 'util' code, guiding them to avoid creating unsafe or unused utility files. It offers crucial insights into the socio-technical aspects of software development and naming conventions.
Original Abstract
Filenames are a concise means of conveying information about source code to fellow developers. One such convention is util. Commonly understood to stand for "utility", filenames with the letters util are often an indication that the file contains code that may be broadly useful or reusable. Some projects use this convention heavily, for example, the Apache Tomcat server contains 925 files with util in the path name, which is 17.9% of all source code files in the tree. While the intent of the name may be to prevent duplicate code and reduce workload, what actually happens to util code over time? Do projects move away from util code as they mature? Are util files being used by fellow colleagues, or maintained and used by their author? The goal of our work is to help developers avoid creating unsafe and unused util files when developing their projects. We conducted a longitudinal mining study of the Git repositories of seven open source projects that have a long development history (Linux kernel, Django, FFmpeg, httpd, Struts, systemd, Tomcat). We analyzed how util usage, complexity, developer collaboration, and security are potentially correlated within these projects. Our longitudinal analysis was measured at 30-day intervals throughout the entire history of each project, resulting in 1773 snapshots over 147 project-years of development. We conducted rename tracking at every 30-day snapshot to examine util files over their entire lifetime in a codebase. For example, we found that a util file can be as much as 2.75 times more likely to be involved in a vulnerability than non-util files. While every project can adopt their own naming conventions, the ubiquity and longevity of util files shows a broader developer intent that is useful for understanding the socio-technical nature of software development.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.