Updated on Oct 30’2025 (Work In Progress)

Updated By: Ravi Shankar, Sr. Applied Scientist-AWS Agentic AI

In our first submission(to the best of my knowledge) from Amazon to BIRD-SQL, we have achieved Top3 on the leaderboard on Test set (single model category). We achieve this while using a Mixture of Expert model of A30B-3B i.e with just 3B active parameters making it extremely fast at run-time and magnitudes of parameter size lower than other submissions. We have used no other datasets than that provided by BIRD team (link) and used a compute time of <24 hours to build this SQL model. Our model showcases the potential of Reinforcement Learning with Verifiable Reward(RLVR) in an Agentic manner and how it can improve upon the existing proprietary models like GPT-5 and Claude-3.5 by as high as 10%.

Rank Model Test Accuracy Model Size Organization
🥇 1 Gemini-SQL 76.13% UNK Google Cloud
🥈 2 Databricks RLVR 75.68% 32B Databricks
🥉 3 DorySQL-3B-MOE 74.85% 3B-MOE Amazon
4 Sophon-Text2SQL-32B 74.79% 32B ByteDance
5 SiriusAI-Text2SQL-32B 74.40% 32B Tencent

analysis_adhoc_oct_19.py

                                                 Fig1: Single Model leaderboard as of Oct 30’2025

                                             **Fig1: Single Model leaderboard as of Oct 30’2025**

We are submitting k@7 results with dev set performance as below.

GOLD-executable questions (fixed denominator) : 1532 / 1534 (99.9%)

✅ Consensus @K Results (Execution-Based)

K Correct / Total Accuracy
1 1080 / 1532 70.50 %
7 1104 / 1532 72.06 %
15 1102 / 1532 71.93 %

📊 Accuracy by Database

Database @K = 1 @K = 7 @K = 15
california_schools 67.42 % 67.42 % 68.54 %
card_games 60.21 % 62.83 % 62.83 %
codebase_community 71.35 % 74.05 % 74.05 %
debit_card_specializing 70.31 % 70.31 % 70.31 %
european_football_2 72.87 % 72.87 % 73.64 %
financial 75.47 % 74.53 % 73.58 %
formula_1 62.43 % 67.05 % 68.21 %
student_club 87.34 % 87.97 % 87.34 %
superhero 89.92 % 93.02 % 91.47 %
thrombosis_prediction 57.06 % 55.21 % 53.37 %
toxicology 68.28 % 71.72 % 72.41 %

🎯 Accuracy by Difficulty

Difficulty @K = 1 @K = 7 @K = 15
challenging 53.47 % 56.25 % 56.25 %
moderate 63.58 % 65.30 % 64.44 %
simple 76.62 % 77.92 % 78.14 %

Key learnings: