ArXiv TLDR

Tail-aware N-version Machine Learning Models for Reliable API Recommendation

🐦 Tweet
2604.27647

Aoi Matsuda, Fumio Machida, David Lo

cs.SE

TLDR

NvRec uses N-version ML models to improve the reliability of API recommendations, especially for infrequently used "tail" APIs, by filtering unreliable outputs.

Key contributions

  • Introduces NvRec, an N-version ML model approach for reliable API recommendations.
  • Profiles ML model performance on individual API methods to filter unreliable "tail" API outputs.
  • Implemented and evaluated NvRec using five diverse ML models on a Java benchmark dataset.
  • Demonstrates high true accept rates (83.8%) for 3-version NvRec and balanced performance for 5-version.

Why it matters

ML-based API recommendations often struggle with rare APIs, leading to unreliable suggestions. NvRec addresses this by improving the trustworthiness of recommendations, especially for the long tail. This enhances developer productivity and code quality by ensuring more accurate and dependable API suggestions.

Original Abstract

Machine learning (ML)-based API recommendation helps developers efficiently identify suitable APIs to complement the application code. However, code datasets used to train ML models often exhibit a long-tail distribution, leading to unreliable API recommendations, especially for infrequently used API methods at the tail of the distribution. To address this issue, we propose N-version API Recommendation (NvRec), which leverages N different versions of ML models to enhance the reliability of API sequence recommendations by suppressing unreliable outputs entailing tail APIs. NvRec leverages a set of available ML models and profiles their performance on individual API methods with their tail properties. The generated model profile is used at inference time to filter out unreliable API recommendations and determine the final output. We implement NvRec using five API recommendation models, including CodeBERT, CodeT5, MulaRec, UniXcoder, and CodeT5+, and evaluate it on a public benchmark dataset constructed from compilable Java projects. For the three-version NvRec, we find that the combination of CodeT5, MulaRec, and UniXcoder achieves the highest true accept rate of 83.8%, with a rejection rate of 80.7%, when majority voting is restricted to highly reliable candidates. In contrast, the five-version configuration achieves its highest true accept rate of 83.1% with simple majority voting, while reducing the rejection rate to 69.0%. Overall, the five-version configuration offers a better balance between true accept rate and rejection rate.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.