ArXiv TLDR

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

🐦 Tweet
2605.12421

Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang + 3 more

cs.AI

TLDR

LLMs should formalize, not optimize, combinatorial solvers, as attempts at search optimization lead to a "heuristic trap" and reduced correctness.

Key contributions

  • Introduces CP-SynC-XL, a new benchmark of 100 combinatorial problems (4,577 instances).
  • Evaluates three LLM solver-construction paradigms: native Python, Python + OR-Tools, and MiniZinc + OR-Tools.
  • Finds Python + OR-Tools achieves highest correctness, while native Python often returns invalid solutions.
  • Shows LLM-authored search optimization yields minimal speed-ups but sharply drops correctness due to a "heuristic trap."

Why it matters

This paper addresses a critical design question for neuro-symbolic systems using LLMs to synthesize combinatorial solvers. It demonstrates that LLMs are better suited for formalizing problem structures than for optimizing search. These findings provide a conservative design principle, improving reliability and correctness in LLM-generated solvers.

Original Abstract

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We introduce CP-SynC-XL, a benchmark of 100 combinatorial problems (4,577 instances), and evaluate three solver-construction paradigms: native algorithmic search (Python), constraint modeling through a Python solver API (Python + OR-Tools), and declarative constraint modeling (MiniZinc + OR-Tools). We find a consistent representational divergence: Python + OR-Tools attains the highest correctness across LLMs, while MiniZinc + OR-Tools has lower absolute coverage despite using the same OR-Tools back-end. Native Python is the most likely to return a schema-valid solution that fails verification, whereas solver-backed paths preserve higher conditional fidelity. On the heuristic axis, prompting for search optimization yields only small median speed-ups (1.03-1.12x) and a strongly bimodal effect: many instances slow down, and correctness drops sharply on a long tail of problems. A paired code-level audit traces these regressions to a recurring heuristic trap. Under an efficiency-oriented prompt, the LLM may replace complete search with local approximations (Python), inject unverified bounds (Python + OR-Tools), or add redundant declarative machinery that overwhelms or over-constrains the model (MiniZinc + OR-Tools). These findings support a conservative design principle for LLM-generated combinatorial solvers: use the LLM primarily to formalize variables, constraints, and objectives for verified solvers, and separately check any LLM-authored search optimization before use.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.