Q-ARE: An Evaluation Dataset for Query Based API Recommendation
Shenglong Wu, Xunhui Zhang, Tao Wang
TLDR
Q-ARE is a new dataset and metrics for evaluating query-based API recommendation methods, revealing struggles with multi-level invocations.
Key contributions
- Introduces Q-ARE, a dataset for query-based API recommendation evaluation.
- Analyzes multi-level API invocation chains from GitHub Java projects.
- Proposes API Call Depth and Invocation Density as new evaluation metrics.
- Evaluates methods, showing struggles with deep, sparse invocation structures.
Why it matters
This paper addresses the challenge of evaluating query-based API recommendation methods, which often struggle with complex, multi-level invocations. Q-ARE and its new metrics provide a critical benchmark to assess semantic understanding and guide the development of more robust API recommendation algorithms.
Original Abstract
As software systems grow in scale, developers face increasing difficulty in selecting appropriate Application Programming Interfaces (APIs) from numerous options. Efficiently identifying APIs that satisfy functional requirements has become a key challenge. To evaluate the semantic understanding of existing query-based API recommendation methods, this paper constructs Q-ARE (Query-based API Recommendation Evaluation), a dataset based on open-source Java projects from GitHub. Methods and their invocation chains are analyzed to identify third-party APIs directly or indirectly invoked by target methods, recursively expanding multi-level invocations to unify hierarchical call structures into API recommendation target sets. Furthermore, we introduce two metrics: API Call Depth, measuring the invocation distance between a query method and a target API, and Invocation Density, quantifying the proportion of code lines associated with the target API in the invocation chain. Based on Q-ARE, we systematically evaluate several query-based API recommendation methods and general Large Language Models (LLMs). Results show that performance drops significantly as API Call Depth increases and invocation density decreases, indicating that existing methods still struggle with multi-level method invocation structures. Q-ARE and its metrics provide a new benchmark for assessing semantic understanding in API recommendation and offer insights for improving future algorithms.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.