Updated on Oct 30’2025 (Work In Progress)
Updated By: Ravi Shankar, Sr. Applied Scientist-AWS Agentic AI
In our first submission(to the best of my knowledge) from Amazon to BIRD-SQL, we have achieved Top3 on the leaderboard on Test set (single model category). We achieve this while using a Mixture of Expert model of A30B-3B i.e with just 3B active parameters making it extremely fast at run-time and magnitudes of parameter size lower than other submissions. We have used no other datasets than that provided by BIRD team (link) and used a compute time of <24 hours to build this SQL model. Our model showcases the potential of Reinforcement Learning with Verifiable Reward(RLVR) in an Agentic manner and how it can improve upon the existing proprietary models like GPT-5 and Claude-3.5 by as high as 10%.
| Rank | Model | Test Accuracy | Model Size | Organization |
|---|---|---|---|---|
| 🥇 1 | Gemini-SQL | 76.13% | UNK | Google Cloud |
| 🥈 2 | Databricks RLVR | 75.68% | 32B | Databricks |
| 🥉 3 | DorySQL-3B-MOE | 74.85% | 3B-MOE | Amazon |
| 4 | Sophon-Text2SQL-32B | 74.79% | 32B | ByteDance |
| 5 | SiriusAI-Text2SQL-32B | 74.40% | 32B | Tencent |

**Fig1: Single Model leaderboard as of Oct 30’2025**
We are submitting k@7 results with dev set performance as below.
GOLD-executable questions (fixed denominator) : 1532 / 1534 (99.9%)
✅ Consensus @K Results (Execution-Based)
| K | Correct / Total | Accuracy |
|---|---|---|
| 1 | 1080 / 1532 | 70.50 % |
| 7 | 1104 / 1532 | 72.06 % |
| 15 | 1102 / 1532 | 71.93 % |
📊 Accuracy by Database
| Database | @K = 1 | @K = 7 | @K = 15 |
|---|---|---|---|
| california_schools | 67.42 % | 67.42 % | 68.54 % |
| card_games | 60.21 % | 62.83 % | 62.83 % |
| codebase_community | 71.35 % | 74.05 % | 74.05 % |
| debit_card_specializing | 70.31 % | 70.31 % | 70.31 % |
| european_football_2 | 72.87 % | 72.87 % | 73.64 % |
| financial | 75.47 % | 74.53 % | 73.58 % |
| formula_1 | 62.43 % | 67.05 % | 68.21 % |
| student_club | 87.34 % | 87.97 % | 87.34 % |
| superhero | 89.92 % | 93.02 % | 91.47 % |
| thrombosis_prediction | 57.06 % | 55.21 % | 53.37 % |
| toxicology | 68.28 % | 71.72 % | 72.41 % |
🎯 Accuracy by Difficulty
| Difficulty | @K = 1 | @K = 7 | @K = 15 |
|---|---|---|---|
| challenging | 53.47 % | 56.25 % | 56.25 % |
| moderate | 63.58 % | 65.30 % | 64.44 % |
| simple | 76.62 % | 77.92 % | 78.14 % |
Key learnings: