ArXiv TLDR

LEVI: Stronger Search Architectures Can Substitute for Larger LLMs in Evolutionary Search

🐦 Tweet
2605.09764

Temoor Tanveer

cs.NEcs.AI

TLDR

LEVI is an evolutionary search framework that leverages stronger architectures to substitute for larger LLMs, drastically cutting costs while improving performance.

Key contributions

  • Establishes and maintains solution diversity with a new database.
  • Uses a smarter mutation router, optimizing large and small LLM strengths.
  • Introduces a rank-preserving proxy benchmark for efficient evaluation.
  • Achieves state-of-the-art results with 3.3-35x lower cost than existing methods.

Why it matters

LLM-guided evolutionary search is powerful but costly due to reliance on frontier models. LEVI addresses this by demonstrating that architectural improvements can significantly reduce computational expenses while maintaining or even surpassing performance. This makes advanced evolutionary search more accessible and efficient for various research domains.

Original Abstract

LLM-guided evolutionary methods such as AlphaEvolve have proven effective in domains like math, systems research, and algorithmic discovery, but their reliance on frontier models makes each run expensive. We argue this is largely an artifact of how existing frameworks allocate search: archives that fail to preserve solution diversity force compensation through stronger mutation models; blind model use spends frontier dollars on local edits a smaller model could handle; and full-set evaluation wastes rollouts on redundant examples. We introduce LEVI, a harness-first evolutionary framework built on the bet that stronger search architectures can substitute for or even outperform larger LLMs in evolutionary search. LEVI improves on three core components of evolutionary search: a solution database that establishes diversity from the beginning, and then maintains it throughout the run; a smarter mutation router that plays into the strengths of large and small LLMs; and a rank-preserving proxy benchmark for rollout-heavy settings. Across systems-research benchmarks LEVI attains the highest score on a budget 3.3-6.7x smaller than the published frontier-model runs of existing frameworks like ShinkaEvolve, GEPA, and AdaEvolve; on one problem, LEVI matches the existing best at a 35x lower cost. On prompt optimization, LEVI matches or exceeds GEPA at less than half of its rollout budget on four different benchmarks. LEVI is available as an open-source framework at https://github.com/ttanv/levi.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.