BIRD-Submission SQL(SpektrBot Ads)

Updated on Oct 30’2025 (Work In Progress)

Contributors: Ravi Shankar, Kishore Kumar,Scott Roberts, Desik Rengarajan (Amazon Ads)

In our first submission(to the best of my knowledge) from Amazon to BIRD-SQL, we have achieved Top3 on the leaderboard on Test set (single model category). We achieve this while using a Mixture of Expert model of A30B-3B i.e with just 3B active parameters making it extremely fast at run-time and magnitudes of parameter size lower than other submissions. We have used no other datasets than that provided by BIRD team (link) and used a compute time of <24 hours to build this SQL model. Our model showcases the potential of Reinforcement Learning with Verifiable Reward(RLVR) in an Agentic manner and how it can improve upon the existing proprietary models like GPT-5 and Claude-3.5 by as high as 10%.

Rank	Model	Test Accuracy	Model Size	Organization
🥇 1	Gemini-SQL	76.13%	UNK	Google Cloud
🥈 2	Databricks RLVR	75.68%	32B	Databricks
🥉 3	DorySQL-3B-MOE	74.85%	3B-MOE	Amazon
4	Sophon-Text2SQL-32B	74.79%	32B	ByteDance
5	SiriusAI-Text2SQL-32B	74.40%	32B	Tencent

analysis_adhoc_oct_19.py

We are submitting k@7 results with dev set performance as below.

GOLD-executable questions (fixed denominator) : 1532 / 1534 (99.9%)

✅ Consensus @K Results (Execution-Based)

K	Correct / Total	Accuracy
1	1080 / 1532	70.50 %
7	1104 / 1532	72.06 %
15	1102 / 1532	71.93 %

📊 Accuracy by Database

Database	@K = 1	@K = 7	@K = 15
california_schools	67.42 %	67.42 %	68.54 %
card_games	60.21 %	62.83 %	62.83 %
codebase_community	71.35 %	74.05 %	74.05 %
debit_card_specializing	70.31 %	70.31 %	70.31 %
european_football_2	72.87 %	72.87 %	73.64 %
financial	75.47 %	74.53 %	73.58 %
formula_1	62.43 %	67.05 %	68.21 %
student_club	87.34 %	87.97 %	87.34 %
superhero	89.92 %	93.02 %	91.47 %
thrombosis_prediction	57.06 %	55.21 %	53.37 %
toxicology	68.28 %	71.72 %	72.41 %

🎯 Accuracy by Difficulty

Difficulty	@K = 1	@K = 7	@K = 15
challenging	53.47 %	56.25 %	56.25 %
moderate	63.58 %	65.30 %	64.44 %
simple	76.62 %	77.92 %	78.14 %

Key learnings: