ArXiv TLDR

Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

🐦 Tweet
2604.03048

Denis Neumüller, Sebastian Boll, David Schüler, Matthias Tichy

cs.SE

TLDR

Combining static analysis with LLMs significantly improves algorithm recognition, reducing runtime and boosting accuracy, even with obfuscated code.

Key contributions

  • Hybrid approach (LLMs + static analysis) reduces LLM calls by 72-97% and improves F1-scores by up to 12 percentage points.
  • Lightweight static analysis filters LLM calls, leading to significant runtime reductions.
  • In-context learning with two examples provides an effective trade-off for classification performance (75-77% F1) and runtime.
  • LLMs can identify most algorithms even with obfuscated identifiers, indicating less dependence on naming.

Why it matters

This paper demonstrates a powerful hybrid approach for automated algorithm recognition, combining LLMs with static analysis. It shows how to achieve substantial performance gains and efficiency improvements. This method can significantly aid program comprehension and software maintenance.

Original Abstract

Context: Since it is well-established that developers spend a substantial portion of their time understanding source code, the ability to automatically identify algorithms within source code presents a valuable opportunity. This capability can support program comprehension, facilitate maintenance, and enhance overall software quality. Objective: We empirically evaluate how combining LLMs with static code analysis can improve the automated recognition of algorithms, while also evaluating their standalone performance and dependence on identifier names. Method: We perform multiple experiments evaluating the combination of LLMs with static analysis using different filter patterns. We compare this combined approach against their standalone performance under various prompting strategies and investigate the impact of systematic identifier obfuscation on classification performance and runtime. Results: The combination of LLMs with lightweight static analysis performs surprisingly well, reducing required LLM calls by 72.39-97.50% depending on the filter pattern. This not only lowers runtime significantly but also improves F1-scores by up to 12 percentage points (pp) compared to the baseline. Regarding the different prompting strategies, in-context learning with two examples provides an effective trade-off between classification performance and runtime efficiency, achieving F1-scores of 75-77% with only a modest increase in inference time. Lastly, we find that LLMs are not solely dependent on name-information as they are still able to identify most algorithm implementations when identifiers are obfuscated. Conclusion: By combining LLMs with static analysis, we achieve substantial reductions in runtime while simultaneously improving F1-scores, underscoring the value of a hybrid approach.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.