FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei + 3 more
TLDR
FlexSQL is a text-to-SQL agent that achieves better performance by flexibly exploring schemas, executing diverse plans, and employing a multi-level repair mechanism.
Key contributions
- Enables flexible database interaction, allowing schema exploration and data inspection at any point.
- Generates diverse execution plans and implements them using either SQL or Python.
- Features a two-tiered repair mechanism that can backtrack from code-level to plan-level errors.
- Achieves 65.4% on Spider2-Snow, outperforming stronger open-source baselines.
Why it matters
This paper introduces FlexSQL, a novel text-to-SQL agent that overcomes limitations of fixed-pipeline systems through flexible database interaction. Its dynamic schema exploration, data inspection, and multi-level error repair significantly improve accuracy on complex analytical databases, setting a new standard for robust text-to-SQL.
Original Abstract
Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery from early mistakes. We present FlexSQL, a text-to-SQL agent whose core design principle is flexible database interaction: the agent can explore schema structure, inspect data values, and run verification queries at any point during reasoning. FlexSQL generates diverse execution plans to cover multiple query interpretations, implements each plan in either SQL or Python depending on the task, and uses a two-tiered repair mechanism that can backtrack from code-level errors to plan-level revisions. On Spider2-Snow, using gpt-oss-120b, FlexSQL achieves a 65.4\% score, outperforming strong open-source baselines that use stronger, larger models such as gpt-o3 and DeepSeek-R1. When integrated into a general-purpose coding agent (as skills in Claude Code), our approach yields over 10\% relative improvement on Spider2-Snow. Further analysis shows that flexible exploration and flexible execution jointly contribute to the effectiveness of our approach, highlighting flexibility as a key design principle. Our code is available at: https://github.com/StringNLPLAB/FlexSQL
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.