AI Hub
All benchmarks
Math

AIME 2025

American Invitational Mathematics Examination — olympiad-level problems; a frontier reasoning test.

221Models
100Top score
61Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

1007550250202420252026GPT-4o-mini: 14.7 (2024-07-18)Llama 3.1 8B Instruct: 4.3 (2024-07-23)Llama 3.1 70B Instruct: 4 (2024-07-23)Llama 3.1 405B Instruct: 3 (2024-07-23)Mistral Large 2: 14 (2024-07-24)Qwen2.5 72B Instruct: 14 (2024-09-19)Llama 3.2 3B Instruct: 3.3 (2024-09-25)Llama 3.2 11B Instruct: 1.7 (2024-09-25)Molmo 7B-D: 0 (2024-09-25)Llama 3.2 1B Instruct: 0 (2024-09-25)Llama 3.1 Nemotron 70B Instruct: 11 (2024-10-01)Pixtral Large: 2.3 (2024-11-19)Nova Pro: 7 (2024-11-20)Nova Lite: 7 (2024-11-20)Nova Micro: 6 (2024-11-20)OLMo 2 7B: 0.7 (2024-11-26)Llama 3.3 70B Instruct: 7.7 (2024-12-06)Gemini 2.0 Flash: 21.7 (2024-12-11)Phi 4: 18 (2025-01-10)DeepSeek R1 Distill Qwen 32B: 63 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 55.7 (2025-01-20)DeepSeek R1 Distill Llama 70B: 53.7 (2025-01-20)DeepSeek R1 Distill Llama 8B: 41.3 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 22 (2025-01-20)Mistral Small 3: 4.3 (2025-01-30)Grok-3 Mini: 90.8 (2025-02-17)Grok 3 mini Reasoning: 84.7 (2025-02-19)Claude 3.7 Sonnet: 61 (2025-02-24)QwQ-32B: 29 (2025-03-05)Reka Flash 3: 33.7 (2025-03-12)Gemma 3 27B Instruct: 20.7 (2025-03-12)Gemma 3 12B Instruct: 18.3 (2025-03-12)Gemma 3 4B Instruct: 12.7 (2025-03-12)Command A: 13 (2025-03-13)OLMo 2 32B: 3.3 (2025-03-13)Gemma 3 1B Instruct: 3.3 (2025-03-13)Mistral Small 3.1: 3.7 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 58.4 (2025-03-18)Llama 3.1 Nemotron Nano 8B V1: 47.1 (2025-03-18)Gemini 2.5 Pro: 88 (2025-03-25)DeepSeek-V3 0324: 41 (2025-03-25)Llama 4 Maverick: 19.3 (2025-04-05)Llama 4 Scout: 14 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 72.5 (2025-04-07)GPT-4.1: 46.4 (2025-04-14)GPT-4.1 Mini: 40.2 (2025-04-14)GPT-4.1 Nano: 24 (2025-04-14)o4-mini: 92.7 (2025-04-16)o3: 86.4 (2025-04-16)Granite 3.3 8B: 6.7 (2025-04-16)Gemini 2.5 Flash: 72 (2025-04-17)Qwen3 235B A22B: 81.5 (2025-04-28)Qwen3: 81.5 (2025-04-28)Qwen3 32B: 72.9 (2025-04-28)Qwen3 30B A3B: 70.9 (2025-04-28)Qwen3 14B: 58 (2025-04-28)Qwen3 1.7B: 38.7 (2025-04-28)Qwen3 8B: 24.3 (2025-04-28)Qwen3 4B: 22.3 (2025-04-28)Qwen3 0.6B: 18 (2025-04-28)Phi 4 Reasoning Plus: 78 (2025-04-30)Phi 4 Reasoning: 62.9 (2025-04-30)Nova Premier: 17.3 (2025-04-30)Mistral Medium 3: 30.3 (2025-05-07)Solar Pro 2: 61.3 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 50 (2025-05-20)Gemini Diffusion: 23.3 (2025-05-20)Gemma 3n E4B Instruct: 14.3 (2025-05-20)Gemma 3n E4B Instructed LiteRT Preview: 11.6 (2025-05-20)Gemma 3n E2B Instructed LiteRT (Preview): 6.7 (2025-05-20)Devstral Small: 29.3 (2025-05-21)Claude Opus 4: 75.5 (2025-05-22)Claude Sonnet 4: 70.5 (2025-05-22)DeepSeek-R1-0528: 87.5 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 63.7 (2025-05-29)Gemini 2.5 Pro Preview 06-05: 88 (2025-06-05)Magistral Medium: 64.9 (2025-06-10)Magistral Small 2506: 62.8 (2025-06-10)Magistral Small 1: 41.3 (2025-06-10)Magistral Medium 1: 40.3 (2025-06-10)MiniMax M1 80k: 61 (2025-06-17)MiniMax M1 40k: 13.7 (2025-06-17)Mistral Small 3.2: 27 (2025-06-20)Gemma 3n E4B Instructed: 11.6 (2025-06-26)Gemma 3n E2B Instruct: 10.3 (2025-06-26)Gemma 3n E2B Instructed: 6.7 (2025-06-26)ERNIE 4.5 300B A47B: 41.3 (2025-06-30)Jamba 1.7 Mini: 0.3 (2025-07-07)Grok 4: 91.7 (2025-07-09)Devstral Medium: 4.7 (2025-07-10)LFM2 1.2B: 3.3 (2025-07-10)Kimi K2: 57 (2025-07-11)Kimi K2 Instruct: 49.5 (2025-07-11)EXAONE 4.0 32B: 80 (2025-07-15)Exaone 4.0 1.2B: 50.3 (2025-07-15)Qwen3-235B-A22B-Instruct-2507: 70.3 (2025-07-22)Gemini 2.5 Flash Lite: 49.8 (2025-07-22)Qwen3 Coder 480B A35B Instruct: 39.3 (2025-07-22)Qwen3-235B-A22B-Thinking-2507: 92.3 (2025-07-25)Qwen3 235B A22B 2507: 91 (2025-07-25)GLM 4.5 Air: 80.7 (2025-07-25)Llama Nemotron Super 49B v1.5: 76.7 (2025-07-25)GLM-4.5: 73.7 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 66.3 (2025-07-29)Qwen3 30B A3B 2507: 56.3 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 29 (2025-07-31)gpt-oss-120b: 93.4 (2025-08-05)gpt-oss-20b: 89.3 (2025-08-05)Claude Opus 4.1: 78 (2025-08-05)Qwen3 4B 2507: 82.7 (2025-08-06)Qwen3 4B 2507 Instruct: 52.3 (2025-08-06)GPT-5: 94.6 (2025-08-07)GPT-5 mini: 91.1 (2025-08-07)GPT-5 nano: 85.2 (2025-08-07)Jamba Large 1.7: 2.3 (2025-08-08)GLM 4.5V: 73 (2025-08-11)Mistral Medium 3.1: 38.3 (2025-08-13)Gemma 3 270M: 2.3 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 69.7 (2025-08-18)Seed-OSS-36B-Instruct: 84.7 (2025-08-20)DeepSeek-V3.1: 49.8 (2025-08-21)Hermes 4 - Llama-3.1 405B: 69.7 (2025-08-27)Hermes 4 - Llama-3.1 70B: 68.7 (2025-08-27)Grok Code Fast 1: 43.3 (2025-08-28)Nemotron Nano 9B V2: 72.1 (2025-09-05)Kimi K2 0905: 57.3 (2025-09-05)Kimi K2-Instruct-0905: 49.5 (2025-09-05)Gemini 2.5 Flash-Lite: 68.7 (2025-09-08)Ling-mini-2.0: 49.3 (2025-09-09)Qwen3-Next-80B-A3B: 84.3 (2025-09-10)Qwen3 Next 80B A3B Thinking: 87.8 (2025-09-11)Qwen3 Next 80B A3B Instruct: 69.5 (2025-09-11)Magistral Small 1.2: 80.3 (2025-09-17)Ling-flash-2.0: 65.3 (2025-09-17)Magistral Medium 1.2: 82 (2025-09-18)Grok 4 Fast: 92 (2025-09-19)Ring-flash-2.0: 83.7 (2025-09-19)DeepSeek V3.1 Terminus: 89.7 (2025-09-22)Qwen3 Omni 30B A3B: 74 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 52.3 (2025-09-22)Granite 4.0 H Small: 13.7 (2025-09-22)GPT-5 Codex: 98.7 (2025-09-23)Qwen3 VL 235B A22B: 88.3 (2025-09-23)Qwen3 Max: 80.7 (2025-09-23)Qwen3 VL 235B A22B Instruct: 70.7 (2025-09-23)LFM2 2.6B: 8.3 (2025-09-23)Gemini 2.5 Flash: 78.3 (2025-09-25)DeepSeek V3.2 Exp: 89.3 (2025-09-29)Claude Sonnet 4.5: 87 (2025-09-29)GLM-4.6: 93.9 (2025-09-30)Apriel-v1.5-15B-Thinker: 87.5 (2025-09-30)Qwen3 VL 30B A3B: 82.3 (2025-10-03)Qwen3 VL 30B A3B Instruct: 72.3 (2025-10-06)LFM2 8B A1B: 25.3 (2025-10-07)Ling-1T: 71.3 (2025-10-08)Jamba Reasoning 3B: 10.7 (2025-10-08)Ring-1T: 89.3 (2025-10-13)Qwen3 VL 4B Instruct: 37 (2025-10-14)Qwen3 VL 8B: 30.7 (2025-10-14)Qwen3 VL 8B Instruct: 27.3 (2025-10-14)Qwen3 VL 4B: 25.7 (2025-10-14)Claude Haiku 4.5: 96.3 (2025-10-15)Phi 4 Mini Instruct: 6.7 (2025-10-17)Granite 4.0 Micro: 6 (2025-10-20)Qwen3 VL 32B: 84.7 (2025-10-21)Qwen3 VL 32B Instruct: 68.3 (2025-10-23)MiniMax-M2: 78.3 (2025-10-27)NVIDIA Nemotron Nano 12B v2 VL: 75 (2025-10-28)Granite 4.0 H 1B: 6.3 (2025-10-28)Granite 4.0 1B: 6.3 (2025-10-28)Granite 4.0 H 350M: 1.3 (2025-10-28)Granite 4.0 350M: 0 (2025-10-28)Kimi Linear 48B A3B Instruct: 36.3 (2025-10-30)Kimi K2 Thinking: 94.7 (2025-11-06)KAT-Coder-Pro V1: 94.7 (2025-11-11)Doubao Seed Code: 79.3 (2025-11-11)GPT-5.1: 94 (2025-11-12)GPT-5.1-Codex: 95.7 (2025-11-13)GPT-5.1-Codex-Mini: 91.7 (2025-11-13)ERNIE 5.0 Thinking: 85 (2025-11-13)Gemini 3 Pro: 95.7 (2025-11-18)Cogito v2.1: 72.7 (2025-11-18)Grok 4.1 Fast: 89.3 (2025-11-19)Olmo 3 7B Think: 70.7 (2025-11-20)Olmo 3 7B Instruct: 41.3 (2025-11-20)Olmo 3 32B Think: 73.7 (2025-11-21)Claude Opus 4.5: 91.3 (2025-11-24)Apriel-v1.6-15B-Thinker: 88 (2025-11-25)Nova 2.0 Omni: 89.7 (2025-11-26)Nova 2.0 Pro: 89 (2025-11-27)INTELLECT-3: 88 (2025-11-27)DeepSeek V3.2 Speciale: 96.7 (2025-12-01)DeepSeek-V3.2: 92 (2025-12-01)Nova 2 Lite: 94.3 (2025-12-02)Mistral Large 3: 38 (2025-12-02)Ministral 3 8B: 31.7 (2025-12-02)Ministral 3 14B: 30 (2025-12-02)Ministral 3 3B: 22 (2025-12-02)Motif-2-12.7B-Reasoning: 80.3 (2025-12-04)K2-V2: 78.3 (2025-12-05)GLM 4.6V: 85.3 (2025-12-08)Devstral 2: 36.7 (2025-12-09)Devstral Small 2: 34.3 (2025-12-09)GPT-5.2: 100 (2025-12-11)Mi:dm K 2.5 Pro: 78.7 (2025-12-11)Olmo 3.1 32B Think: 77.3 (2025-12-12)MiMo-V2-Flash: 96.3 (2025-12-14)NVIDIA Nemotron 3 Nano 30B A3B: 91 (2025-12-15)Gemini 3 Flash: 97 (2025-12-17)GLM 4.7: 95 (2025-12-22)MiniMax M2.1: 82.7 (2025-12-23)HyperCLOVA X SEED Think: 59 (2025-12-26)K-EXAONE: 90.3 (2025-12-31)Falcon-H1R-7B: 80 (2026-01-04)Qwen3 Max Thinking: 82.3 (2026-02-09)Phi-3 Mini Instruct 3.8B: 0.3 (2024-04-23)Phi-3 Mini Instruct 3.8BGPT-4o: 25.7 (2024-05-13)GPT-4oDeepSeek-V3: 26 (2024-12-26)DeepSeek-V3DeepSeek-R1: 68 (2025-01-20)DeepSeek-R1Grok-3: 93.3 (2025-02-17)Grok-3Grok-4 Heavy: 100 (2025-07-09)Grok-4 Heavy

Ranking

1Grok-4 Heavy
100
2GPT-5.2
100
3GPT-5 Codex
98.7
4Gemini 3 Flash
97
5DeepSeek V3.2 Speciale
96.7
6MiMo-V2-Flash
96.3
7Claude Haiku 4.5
96.3
8GPT-5.1-Codex
95.7
9Gemini 3 Pro
95.7
10GLM 4.7
95
11KAT-Coder-Pro V1
94.7
12Kimi K2 Thinking
94.7
13GPT-5
94.6
14Nova 2 Lite
94.3
15GPT-5.1
94
16GLM-4.6
93.9
17gpt-oss-120b
93.4
18Grok-3
93.3
19o4-mini
92.7
20Qwen3-235B-A22B-Thinking-2507
92.3
21DeepSeek-V3.2
92
22Grok 4 Fast
92
23GPT-5.1-Codex-Mini
91.7
24Grok 4
91.7
25Claude Opus 4.5
91.3
26GPT-5 mini
91.1
27Qwen3 235B A22B 2507
91
28NVIDIA Nemotron 3 Nano 30B A3B
91
29Grok-3 Mini
90.8
30K-EXAONE
90.3
31Nova 2.0 Omni
89.7
32DeepSeek V3.1 Terminus
89.7
33Ring-1T
89.3
34Grok 4.1 Fast
89.3
35DeepSeek V3.2 Exp
89.3
36gpt-oss-20b
89.3
37Nova 2.0 Pro
89
38Qwen3 VL 235B A22B
88.3
39Apriel-v1.6-15B-Thinker
88
40INTELLECT-3
88
41Gemini 2.5 Pro Preview 06-05
88
42Gemini 2.5 Pro
88
43Qwen3 Next 80B A3B Thinking
87.8
44Apriel-v1.5-15B-Thinker
87.5
45DeepSeek-R1-0528
87.5
46Claude Sonnet 4.5
87
47o3
86.4
48GLM 4.6V
85.3
49GPT-5 nano
85.2
50ERNIE 5.0 Thinking
85
51Seed-OSS-36B-Instruct
84.7
52Qwen3 VL 32B
84.7
53Grok 3 mini Reasoning
84.7
54Qwen3-Next-80B-A3B
84.3
55Ring-flash-2.0
83.7
56Qwen3 4B 2507
82.7
57MiniMax M2.1
82.7
58Qwen3 VL 30B A3B
82.3
59Qwen3 Max Thinking
82.3
60Magistral Medium 1.2
82
61Qwen3 235B A22B
81.5
62Qwen3
81.5
63GLM 4.5 Air
80.7
64Qwen3 Max
80.7
65Motif-2-12.7B-Reasoning
80.3
66Magistral Small 1.2
80.3
67EXAONE 4.0 32B
80
68Falcon-H1R-7B
80
69Doubao Seed Code
79.3
70Mi:dm K 2.5 Pro
78.7
71Gemini 2.5 Flash
78.3
72K2-V2
78.3
73MiniMax-M2
78.3
74Phi 4 Reasoning Plus
78
75Claude Opus 4.1
78
76Olmo 3.1 32B Think
77.3
77Llama Nemotron Super 49B v1.5
76.7
78Claude Opus 4
75.5
79NVIDIA Nemotron Nano 12B v2 VL
75
80Qwen3 Omni 30B A3B
74
81Olmo 3 32B Think
73.7
82GLM-4.5
73.7
83GLM 4.5V
73
84Qwen3 32B
72.9
85Cogito v2.1
72.7
86Llama 3.1 Nemotron Ultra 253B v1
72.5
87Qwen3 VL 30B A3B Instruct
72.3
88Nemotron Nano 9B V2
72.1
89Gemini 2.5 Flash
72
90Ling-1T
71.3
91Qwen3 30B A3B
70.9
92Olmo 3 7B Think
70.7
93Qwen3 VL 235B A22B Instruct
70.7
94Claude Sonnet 4
70.5
95Qwen3-235B-A22B-Instruct-2507
70.3
96Hermes 4 - Llama-3.1 405B
69.7
97NVIDIA Nemotron Nano 9B V2
69.7
98Qwen3 Next 80B A3B Instruct
69.5
99Gemini 2.5 Flash-Lite
68.7
100Hermes 4 - Llama-3.1 70B
68.7
101Qwen3 VL 32B Instruct
68.3
102DeepSeek-R1
68
103Qwen3 30B A3B 2507 Instruct
66.3
104Ling-flash-2.0
65.3
105Magistral Medium
64.9
106DeepSeek R1 0528 Qwen3 8B
63.7
107DeepSeek R1 Distill Qwen 32B
63
108Phi 4 Reasoning
62.9
109Magistral Small 2506
62.8
110Solar Pro 2
61.3
111MiniMax M1 80k
61
112Claude 3.7 Sonnet
61
113HyperCLOVA X SEED Think
59
114Llama-3.3 Nemotron Super 49B v1
58.4
115Qwen3 14B
58
116Kimi K2 0905
57.3
117Kimi K2
57
118Qwen3 30B A3B 2507
56.3
119DeepSeek R1 Distill Qwen 14B
55.7
120DeepSeek R1 Distill Llama 70B
53.7
121Qwen3 4B 2507 Instruct
52.3
122Qwen3 Omni 30B A3B Instruct
52.3
123Exaone 4.0 1.2B
50.3
124Llama 3.1 Nemotron Nano 4B v1.1
50
125Gemini 2.5 Flash Lite
49.8
126DeepSeek-V3.1
49.8
127Kimi K2-Instruct-0905
49.5
128Kimi K2 Instruct
49.5
129Ling-mini-2.0
49.3
130Llama 3.1 Nemotron Nano 8B V1
47.1
131GPT-4.1
46.4
132Grok Code Fast 1
43.3
133Magistral Small 1
41.3
134Olmo 3 7B Instruct
41.3
135DeepSeek R1 Distill Llama 8B
41.3
136ERNIE 4.5 300B A47B
41.3
137DeepSeek-V3 0324
41
138Magistral Medium 1
40.3
139GPT-4.1 Mini
40.2
140Qwen3 Coder 480B A35B Instruct
39.3
141Qwen3 1.7B
38.7
142Mistral Medium 3.1
38.3
143Mistral Large 3
38
144Qwen3 VL 4B Instruct
37
145Devstral 2
36.7
146Kimi Linear 48B A3B Instruct
36.3
147Devstral Small 2
34.3
148Reka Flash 3
33.7
149Ministral 3 8B
31.7
150Qwen3 VL 8B
30.7
151Mistral Medium 3
30.3
152Ministral 3 14B
30
153Devstral Small
29.3
154QwQ-32B
29
155Qwen3 Coder 30B A3B Instruct
29
156Qwen3 VL 8B Instruct
27.3
157Mistral Small 3.2
27
158DeepSeek-V3
26
159Qwen3 VL 4B
25.7
160GPT-4o
25.7
161LFM2 8B A1B
25.3
162Qwen3 8B
24.3
163GPT-4.1 Nano
24
164Gemini Diffusion
23.3
165Qwen3 4B
22.3
166DeepSeek R1 Distill Qwen 1.5B
22
167Ministral 3 3B
22
168Gemini 2.0 Flash
21.7
169Gemma 3 27B Instruct
20.7
170Llama 4 Maverick
19.3
171Gemma 3 12B Instruct
18.3
172Qwen3 0.6B
18
173Phi 4
18
174Nova Premier
17.3
175GPT-4o-mini
14.7
176Gemma 3n E4B Instruct
14.3
177Qwen2.5 72B Instruct
14
178Llama 4 Scout
14
179Mistral Large 2
14
180MiniMax M1 40k
13.7
181Granite 4.0 H Small
13.7
182Command A
13
183Gemma 3 4B Instruct
12.7
184Gemma 3n E4B Instructed LiteRT Preview
11.6
185Gemma 3n E4B Instructed
11.6
186Llama 3.1 Nemotron 70B Instruct
11
187Jamba Reasoning 3B
10.7
188Gemma 3n E2B Instruct
10.3
189LFM2 2.6B
8.3
190Llama 3.3 70B Instruct
7.7
191Nova Pro
7
192Nova Lite
7
193Granite 3.3 8B
6.7
194Gemma 3n E2B Instructed LiteRT (Preview)
6.7
195Gemma 3n E2B Instructed
6.7
196Phi 4 Mini Instruct
6.7
197Granite 4.0 H 1B
6.3
198Granite 4.0 1B
6.3
199Nova Micro
6
200Granite 4.0 Micro
6
201Devstral Medium
4.7
202Llama 3.1 8B Instruct
4.3
203Mistral Small 3
4.3
204Llama 3.1 70B Instruct
4
205Mistral Small 3.1
3.7
206OLMo 2 32B
3.3
207LFM2 1.2B
3.3
208Gemma 3 1B Instruct
3.3
209Llama 3.2 3B Instruct
3.3
210Llama 3.1 405B Instruct
3
211Gemma 3 270M
2.3
212Pixtral Large
2.3
213Jamba Large 1.7
2.3
214Llama 3.2 11B Instruct
1.7
215Granite 4.0 H 350M
1.3
216OLMo 2 7B
0.7
217Phi-3 Mini Instruct 3.8B
0.3
218Jamba 1.7 Mini
0.3
219Granite 4.0 350M
0
220Molmo 7B-D
0
221Llama 3.2 1B Instruct
0

Related Math benchmarks