AFGNN: API Misuse Detection using Graph Neural Networks and Clustering
Ponnampalam Pirapuraj, Tamal Mondal, Sharanya Gupta, Akash Lal, Somak Aditya + 1 more
TLDR
AFGNN is a GNN-based framework that uses API Flow Graphs and self-supervised learning to detect API misuses in Java code, outperforming existing methods.
Key contributions
- Develops AFGNN, a GNN framework for detecting API misuses in Java code.
- Introduces API Flow Graph (AFG) to capture API execution, data, and control flow.
- Uses self-supervised pre-training to embed and cluster unknown API usage patterns.
- Significantly outperforms state-of-the-art small language models and API misuse detectors.
Why it matters
API misuse leads to critical bugs and vulnerabilities, especially as developers rely on diverse, potentially error-prone resources. AFGNN offers a robust, automated solution to identify these misuses, improving software safety and reducing development time by proactively catching errors.
Original Abstract
Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) representation that captures the API execution sequence, data, and control flow information present in the code to model the API usage patterns. AFGNN uses self-supervised pre-training with AFG representation to effectively compute the embeddings for unknown API usage examples and cluster them to identify different usage patterns. Experiments on popular API usage datasets show that AFGNN significantly outperforms state-of-the-art small language models and API misuse detectors.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.