Submission by AWS Quick team: https://aws.amazon.com/quicksuite/
Contributors: Ravi Shankar, Leah Riley, Adi Kalyanpur, QuickScience team
BIRD leaderboard: https://bird-bench.github.io/

Transparency note: Below is an LLM generated early artifact which would be updated for paper submission
Teams want a model that takes ambiguous questions and returns **executable SQL-**reliably, on unfamiliar schemas, without exotic infrastructure. In practice that means: reason across multi-table joins, follow foreign keys, use realistic sample values, recover from dead-ends, and respond quickly on ordinary hardware.
Q-SQL is built for exactly that. It’s a compact 3B Mixture-of-Experts model trained purely with reinforcement learning in multi-turn settings. On BIRD’s Single Trained Model track, Q-SQL reaches 76.47% Execution Accuracy (Test) and 72.99% (Dev) using a “Many” self-consistency budget (15 candidates), establishing a new state of the art as of December 6, 2025. Despite the accuracy, Q-SQL remains practical: < 24 hours of training compute, and consumer-GPU deployment with low-latency inference.
We attribute the gains to four choices:
Rather than “teaching” SQL with supervised pairs, we train the policy end-to-end with reinforcement learning. Episodes are multi-turn: the policy plans, probes schema evidence, proposes candidates, debugs, and finalizes. Optimization follows a GRPO-style loop inspired by recent reasoning-focused RL (the family popularized by DeepSeek-R1) [8]. Final reward is a combination of process and execution-based