AFGNN: API Misuse Detection using Graph Neural Networks and Clustering

April 9, 20262604.07891

Ponnampalam Pirapuraj, Tamal Mondal, Sharanya Gupta, Akash Lal, Somak Aditya + 1 more

cs.SE

TLDR

AFGNN is a GNN-based framework that uses API Flow Graphs and self-supervised learning to detect API misuses in Java code, outperforming existing methods.

Key contributions

Develops AFGNN, a GNN framework for detecting API misuses in Java code.
Introduces API Flow Graph (AFG) to capture API execution, data, and control flow.
Uses self-supervised pre-training to embed and cluster unknown API usage patterns.
Significantly outperforms state-of-the-art small language models and API misuse detectors.

Why it matters

API misuse leads to critical bugs and vulnerabilities, especially as developers rely on diverse, potentially error-prone resources. AFGNN offers a robust, automated solution to identify these misuses, improving software safety and reducing development time by proactively catching errors.

Original Abstract

Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) representation that captures the API execution sequence, data, and control flow information present in the code to model the API usage patterns. AFGNN uses self-supervised pre-training with AFG representation to effectively compute the embeddings for unknown API usage examples and cluster them to identify different usage patterns. Experiments on popular API usage datasets show that AFGNN significantly outperforms state-of-the-art small language models and API misuse detectors.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers