Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation

May 8, 20262605.07525

Luciano Baresi, Domenico Bianculli, Maryse Ernzer, Livia Lestingi, Fabrizio Pastore + 1 more

cs.SE

TLDR

Q-SAGE evaluates LLMs for quantum solver generation, showing iterative refinement improves success but reveals numerical accuracy as a key limitation.

Key contributions

Introduces Q-SAGE, an iterative methodology for evaluating LLM-generated quantum solvers.
Q-SAGE refines LLM-generated scripts by comparing results with classical solvers.
Iterative refinement substantially improves LLM success rates in quantum solver generation.
Highlights that LLM failures shift from execution errors to numerical inaccuracies with better models.

Why it matters

This paper introduces a crucial method, Q-SAGE, for rigorously evaluating LLMs in quantum computing, moving beyond mere code execution to numerical accuracy. It reveals that while LLMs can generate quantum solvers, achieving reliable scientific results requires addressing complex numerical challenges. This work is vital for advancing LLM-driven scientific discovery.

Original Abstract

Large Language Models (LLMs) show strong capabilities in code generation, motivating their use in automated quantum solver development. However, in quantum computing, successful execution of generated code is not sufficient: correctness depends on numerically accurate results, which are sensitive to non-trivial mappings, hybrid quantum-classical workflows, and algorithm-specific approximations. This work introduces Q-SAGE, an iterative methodology to evaluate LLMs' capability in generating quantum solvers for scientific problems. The methodology adopts an iterative approach by executing the script generated by the LLM, comparing the result with the result of a classical solver, and refining the script until the two results match within a tolerance threshold. We empirically evaluated the methodology with five families of scientific problems of different complexities and five LLMs, both open source and proprietary. The results show that iterative refinement substantially improves success rates, but introduces a significant computational overhead. Moreover, as model capability increases, failure modes shift from execution errors to numerical inaccuracies, highlighting the current limitations of LLM-based quantum software.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers