Programming Language Co-Usage Patterns on Stack Overflow: Analysis of the Developer Ecosystem
TLDR
This paper analyzes programming language co-usage patterns on Stack Overflow to uncover hidden structures, developer specializations, and ecosystem communities.
Key contributions
- Developed a three-phase pipeline using FP-Growth, LDA, and Louvain to analyze language co-usage on Stack Overflow.
- Identified tight language coupling (e.g., shell/bash, Swift/Objective-C) using FP-Growth.
- Discovered 25 developer profiles, including Apple-platform, scientific, and functional programmers, via LDA.
- Partitioned the language ecosystem into web/enterprise, Apple, and systems/scientific communities, with Java as a central hub.
Why it matters
This research provides a data-driven understanding of how programming languages are used together in the real world. It reveals the underlying structure of the software ecosystem, identifying complementary languages, technology stacks, and developer specializations. This insight is crucial for language designers, tool builders, and educators.
Original Abstract
Understanding how developers combine programming languages in practice reveals the hidden structure of the software ecosystem: which languages are used as complements, which define coherent technology stacks, and which bridge disparate communities. We present a three-phase empirical pipeline that mines Stack Overflow posts by hundreds of thousands of developers across 186 programming languages, applying FP-Growth frequent itemset mining, Latent Dirichlet Allocation topic modeling, and Louvain community detection on a weighted co-usage graph, with the goal of characterizing co-usage coupling, latent developer specializations, and macro-level ecosystem structure simultaneously from behavioral data. FP-Growth identifies tight coupling clusters such as shell/bash, Swift/Objective-C, and the C-family with lift values far exceeding what individual language popularity predicts. LDA produces 25 developer profiles including Apple-platform developers, scientific and hardware programmers, functional/academic programmers, and two distinct Unix scripting sub-profiles. Louvain partitions the language graph into three macro-communities: web/enterprise, Apple ecosystem, and systems/scientific, and identifies Java as the highest-degree hub connecting all three. All three methods independently converge on the same ecosystem structure, providing strong cross-method validation of the findings.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.