ArXiv TLDR

Verifier Warnings Do Not Improve Comprehensibility Prediction

🐦 Tweet
2604.22653

Nadeeshan De Silva, Martin Kellogg, Oscar Chaparro

cs.SE

TLDR

ML models for code comprehensibility prediction are not improved by including verifier warning counts, despite a known correlation.

Key contributions

  • Examined if verifier warning sums enhance ML models for code comprehensibility prediction.
  • Performed a control-treatment experiment comparing models with and without verifier warning features.
  • Discovered no significant performance improvement when incorporating verifier warning sums.
  • Concluded that traditional syntactic and developer features are equally effective for prediction.

Why it matters

This paper challenges the assumption that verifier warnings are a useful signal for code comprehensibility prediction in ML models. It shows that traditional syntactic and developer features are equally effective. This is important for researchers and practitioners building code quality assessment tools, guiding feature engineering efforts.

Original Abstract

Proponents of software verification suggest that code simplicity is linked to the effort to verify code, hypothesizing that formal verifiers produce fewer false positive warnings and require less manual intervention when analyzing simpler code. A recent meta-analysis study found empirical support for this hypothesis: a small correlation between the sum of verifier warnings and human-derived code comprehensibility metrics. Based on this finding, we conjectured that using the sum of verifier tool (verifier) warnings to represent program semantic information as an input feature to machine learning (ML) models for code comprehensibility prediction can enhance their performance, when combined with traditional syntactic and developer features. To test this conjecture, we performed a control-treatment experiment incorporating the verifier warning sum feature into machine learning models from the literature, and conducted a comparative analysis of their performance against models trained only on syntactic and developer features. We found no significant difference in the prediction performance of models with and without the warnings feature. Our findings suggest that while a correlation exists, the verifier warning sum offers limited discriminative power: combining syntactic and developer features is just as effective for predicting human-judged code comprehensibility.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.