ArXiv TLDR

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

🐦 Tweet
2605.04677

Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora, Akhilesh Deepak Gotmare, Gokulakrishnan Gopalakrishnan + 7 more

cs.SEcs.AI

TLDR

CodeEvolve is an LLM-driven evolutionary framework that uses runtime data and MCTS to automatically enhance code performance and quality across multiple languages.

Key contributions

  • Introduces CodeEvolve, an LLM-driven evolutionary framework for multi-language code optimization.
  • Uses runtime profiles (JFR) for guided target selection, focusing on high-cost execution areas.
  • Integrates Monte Carlo Tree Search (MCTS) and automated refinement for robust code enhancement.
  • Achieves 15.22x speedup on Java hotspots and outperforms single-pass LLM optimization.

Why it matters

This paper introduces a novel approach to automated code optimization by combining LLMs with evolutionary search and runtime analysis. It significantly improves performance and code quality, demonstrating a practical path for large-scale code enhancement in enterprise environments. The system's ability to reliably generate valid, optimized code is a key advancement.

Original Abstract

We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing reliance on manual bottleneck identification. For each target, CodeEvolve generates candidate edits, evaluates them through build validation, unit tests, performance checks, static analysis, and LLM-based review, and retains only variants that preserve functional correctness. Across real-world optimization tasks, CodeEvolve improves performance and code metrics while maintaining correctness. On a large enterprise Java codebase, it achieves an average speedup of 15.22$\times$ across seven hotspot functions and outperforms single-pass LLM optimization on five of them. An ablation study on Apex optimization shows that the full MCTS-augmented configuration produces 19.5 valid programs out of 20 on average, indicating that search, filtering, and refinement each contribute to more reliable optimization.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.