Math

AIME 2024

American Invitational Mathematics Examination 2024, consisting of 30 challenging mathematical reasoning problems from AIME I and AIME II competitions.

Source

46Models

95.8Top score

80Median

American Invitational Mathematics Examination 2024, consisting of 30 challenging mathematical reasoning problems from AIME I and AIME II competitions. Each problem requires an integer answer between 0-999 and tests advanced mathematical reasoning across algebra, geometry, combinatorics, and number theory. Used as a benchmark for evaluating mathematical reasoning capabilities in large language models at Olympiad-level difficulty.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Grok-3 MinixAI	95.8
2	o4-miniOpenAI	93.4
3	Grok-3xAI	93.3
4	Gemini 2.5 ProGoogle	92
5	o3OpenAI	91.6
6	DeepSeek-R1-0528DeepSeek	91.4
7	GLM-4.5Zhipu AI	91
8	GLM 4.5 AirZhipu AI	89.4
9	Gemini 2.5 FlashGoogle	88
10	o3-miniOpenAI	87.3
11	DeepSeek R1 ZeroDeepSeek	86.7
12	DeepSeek R1 Distill Llama 70BDeepSeek	86.7
13	o1-proOpenAI	86
14	Qwen3 235B A22BAlibaba	85.7
15	DeepSeek R1 Distill Qwen 7BDeepSeek	83.3
16	DeepSeek R1 Distill Qwen 32BDeepSeek	83.3
17	Qwen3 32BAlibaba	81.4
18	Phi 4 Reasoning PlusMicrosoft	81.3
19	Granite 3.3 8B InstructIBM	81.2
20	Granite 3.3 8B BaseIBM	81.2
21	Qwen3 30B A3BAlibaba	80.4
22	DeepSeek R1 Distill Qwen 14BDeepSeek	80
23	DeepSeek R1 Distill Llama 8BDeepSeek	80
24	Claude 3.7 SonnetAnthropic	80
25	QwQ-32BAlibaba	79.5
26	Kimi-k1.5Moonshot AI	77.5
27	Phi 4 ReasoningMicrosoft	75.3
28	o1OpenAI	74.3
29	Magistral MediumMistral AI	73.6
30	Gemini 2.0 Flash ThinkingGoogle	73.3
31	Kimi K2 0905Moonshot AI	72
32	Magistral Small 2506Mistral AI	70.7
33	Kimi K2-Instruct-0905Moonshot AI	69.6
34	Kimi K2 InstructMoonshot AI	69.6
35	Kimi K2Moonshot AI	69.6
36	DeepSeek-V3.1DeepSeek	66.3
37	DeepSeek-V3 0324DeepSeek	59.4
38	DeepSeek R1 Distill Qwen 1.5BDeepSeek	52.7
39	QwQ-32B-PreviewAlibaba	50
40	GPT-4.1 MiniOpenAI	49.6
41	GPT-4.1OpenAI	48.1
42	o1-previewOpenAI	42
43	DeepSeek-V3DeepSeek	39.2
44	GPT-4.5OpenAI	36.7
45	GPT-4.1 NanoOpenAI	29.4
46	GPT-4oOpenAI	13.1

Related Math benchmarks

AIME 2025221 MATH-500169 MATH67 GSM8K45 MGSM29 HMMT 202511