ArXiv TLDR

Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

🐦 Tweet
2604.26667

Giuseppe De Rosa, Pietro Liguori

cs.SE

TLDR

This paper predicts post-release Python defects using supervised ML models with product and process metrics, achieving high recall.

Key contributions

  • Supervised ML models effectively predict post-release Python defects.
  • Achieved 0.85-0.9 recall and an order of magnitude reduction in false negatives.
  • Process metrics (age, churn, developer activity) and size are key predictors.
  • Metrics and code embeddings capture complementary defect information.

Why it matters

Python's dynamic nature makes defect prediction challenging. This paper offers a practical, metric-driven approach to identify post-release faults, helping developers proactively address issues before production impact.

Original Abstract

Python's dynamic nature complicates testing and increases the possibility that some defects evade detection, so an effective fault prediction becomes essential. We examine whether post-release faults can be predicted using modern ML and DL. Using a balanced dataset of over 4,000 labeled faults with 83 product, process, statistical, and Python-specific metrics plus normalized code representations, we conduct cross-project experiments. LLMs and unsupervised models fail to distinguish residual from non-residual faults, while supervised metric-based models (RandomForest, XGBoost, CatBoost) perform far better, yielding a 0.85-0.9 recall and cutting false negatives by an order of magnitude. Process metrics, especially age, churn, and developer-activity, alongside class and file size, consistently prove most predictive. Notably, the Principal Component Analysis shows that metrics and code embeddings occupy distinct regions of the representation space, suggesting that they capture complementary rather than redundant information.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.