Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition
Denis Neumüller, Florian Sihler, Raphael Straub, Matthias Tichy
TLDR
This paper introduces an AST-pattern-based approach for automatic algorithm recognition, outperforming LLMs and clone detection tools.
Key contributions
- Proposes an AST-pattern-based approach for automated algorithm recognition.
- Develops a domain-specific language and matching algorithm for defining and finding algorithm patterns.
- Achieves an F1-score of 0.74, significantly outperforming Codellama (0.35) on algorithm recognition.
- Demonstrates a recall of 0.62, surpassing leading code clone detection tools (0.20).
Why it matters
Algorithm recognition is key for software maintenance and quality improvement, but current methods are often inaccurate. This paper introduces an AST-pattern approach that significantly outperforms LLMs and code clone detection tools. Its high accuracy provides a practical solution for automated code analysis.
Original Abstract
The automated recognition of algorithm implementations can support many software maintenance and re-engineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.