ArXiv TLDR

R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL

🐦 Tweet
2604.25325

Hojae Han, Yeonseok Jeong, Seung-won Hwang, Zhewei Yao, Yuxiong He

cs.SEcs.AIcs.CL

TLDR

R$^3$-SQL improves Text-to-SQL by consistently ranking functionally equivalent queries and intelligently resampling when the correct SQL is missing.

Key contributions

  • R$^3$-SQL groups candidate SQL queries by execution result for consistent ranking.
  • It combines pairwise preferences and pointwise utility to score these groups effectively.
  • Introduces agentic resampling to improve recall by generating new candidates when needed.
  • Achieves state-of-the-art 75.03% execution accuracy on BIRD-dev and consistent gains across five benchmarks.

Why it matters

This paper addresses critical limitations in Text-to-SQL, specifically inconsistent scoring and poor recall. R$^3$-SQL's unified ranking and resampling framework significantly improves accuracy and robustness. Its novel approach to handling equivalent queries and intelligently expanding the candidate pool marks a notable advancement.

Original Abstract

Modern Text-to-SQL systems generate multiple candidate SQL queries and rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalent SQL queries inconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R$^3$-SQL, a Text-to-SQL framework that addresses both issues through unified reward for ranking and resampling. R$^3$-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines a pairwise preference across groups with a pointwise utility from the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R$^3$-SQL introduces agentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R$^3$-SQL achieves 75.03 execution accuracy on BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.