FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

May 4, 20262605.02815

Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei + 3 more

cs.CL

TLDR

FlexSQL is a text-to-SQL agent that achieves better performance by flexibly exploring schemas, executing diverse plans, and employing a multi-level repair mechanism.

Key contributions

Enables flexible database interaction, allowing schema exploration and data inspection at any point.
Generates diverse execution plans and implements them using either SQL or Python.
Features a two-tiered repair mechanism that can backtrack from code-level to plan-level errors.
Achieves 65.4% on Spider2-Snow, outperforming stronger open-source baselines.

Why it matters

This paper introduces FlexSQL, a novel text-to-SQL agent that overcomes limitations of fixed-pipeline systems through flexible database interaction. Its dynamic schema exploration, data inspection, and multi-level error repair significantly improve accuracy on complex analytical databases, setting a new standard for robust text-to-SQL.

Original Abstract

Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery from early mistakes. We present FlexSQL, a text-to-SQL agent whose core design principle is flexible database interaction: the agent can explore schema structure, inspect data values, and run verification queries at any point during reasoning. FlexSQL generates diverse execution plans to cover multiple query interpretations, implements each plan in either SQL or Python depending on the task, and uses a two-tiered repair mechanism that can backtrack from code-level errors to plan-level revisions. On Spider2-Snow, using gpt-oss-120b, FlexSQL achieves a 65.4\% score, outperforming strong open-source baselines that use stronger, larger models such as gpt-o3 and DeepSeek-R1. When integrated into a general-purpose coding agent (as skills in Claude Code), our approach yields over 10\% relative improvement on Spider2-Snow. Further analysis shows that flexible exploration and flexible execution jointly contribute to the effectiveness of our approach, highlighting flexibility as a key design principle. Our code is available at: https://github.com/StringNLPLAB/FlexSQL

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers