Submission by AWS Quick team: https://aws.amazon.com/quicksuite/

Contributors: Ravi Shankar, Leah Riley, Adi Kalyanpur, QuickScience team


BIRD leaderboard: https://bird-bench.github.io/

bird_rank_1_updated.png


Transparency note: Below is an LLM generated early artifact which would be updated for paper submission

Teams want a model that takes ambiguous questions and returns **executable SQL-**reliably, on unfamiliar schemas, without exotic infrastructure. In practice that means: reason across multi-table joins, follow foreign keys, use realistic sample values, recover from dead-ends, and respond quickly on ordinary hardware.

Q-SQL is built for exactly that. It’s a compact 3B Mixture-of-Experts model trained purely with reinforcement learning in multi-turn settings. On BIRD’s Single Trained Model track, Q-SQL reaches 76.47% Execution Accuracy (Test) and 72.99% (Dev) using a “Many” self-consistency budget (15 candidates), establishing a new state of the art as of December 6, 2025. Despite the accuracy, Q-SQL remains practical: < 24 hours of training compute, and consumer-GPU deployment with low-latency inference.

We attribute the gains to four choices:


What’s actually different

Pure RL, multi-turn reasoning-no SFT safety net

Rather than “teaching” SQL with supervised pairs, we train the policy end-to-end with reinforcement learning. Episodes are multi-turn: the policy plans, probes schema evidence, proposes candidates, debugs, and finalizes. Optimization follows a GRPO-style loop inspired by recent reasoning-focused RL (the family popularized by DeepSeek-R1) [8]. Final reward is a combination of process and execution-based

M-Schema: the right inductive bias for databases