Math

MATH-500

MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions.

Source

169Models

99.4Top score

83.9Median

MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	GPT-5OpenAI	99.4
2	Grok 3 mini ReasoningxAI	99.2
3	o3OpenAI	99.2
4	Claude Sonnet 4Anthropic	99.1
5	Grok 4xAI	99
6	o4-miniOpenAI	98.9
7	o3-miniOpenAI	98.5
8	Qwen3 235B A22B 2507Alibaba	98.4
9	Llama Nemotron Super 49B v1.5NVIDIA	98.3
10	DeepSeek-R1-0528DeepSeek	98.3
11	GLM-4.5Zhipu AI	98.2
12	Claude Opus 4Anthropic	98.2
13	GLM 4.5 AirZhipu AI	98.1
14	Gemini 2.5 FlashGoogle	98.1
15	MiniMax M1 80kMiniMax	98
16	Qwen3-235B-A22B-Instruct-2507Alibaba	98
17	Nemotron Nano 9B V2NVIDIA	97.8
18	EXAONE 4.0 32BLG AI Research	97.7
19	Qwen3 30B A3B 2507Alibaba	97.6
20	Qwen3 30B A3B 2507 InstructAlibaba	97.5
21	Kimi K2-Instruct-0905Moonshot AI	97.4
22	Kimi K2 InstructMoonshot AI	97.4
23	MiniMax M1 40kMiniMax	97.2
24	Kimi K2Moonshot AI	97.1
25	Llama 3.1 Nemotron Ultra 253B v1NVIDIA	97
26	o1OpenAI	97
27	Gemini 2.5 Flash LiteGoogle	96.9
28	Solar Pro 2Upstage	96.7
29	Gemini 2.5 ProGoogle	96.7
30	Llama-3.3 Nemotron Super 49B v1NVIDIA	96.6
31	DeepSeek-R1DeepSeek	96.6
32	Magistral Small 1Mistral AI	96.3
33	Kimi-k1.5Moonshot AI	96.2
34	Claude 3.7 SonnetAnthropic	96.2
35	Qwen3 32BAlibaba	96.1
36	Qwen3 14BAlibaba	96.1
37	DeepSeek R1 ZeroDeepSeek	95.9
38	Qwen3 30B A3BAlibaba	95.9
39	Sonar Reasoning ProPerplexity	95.7
40	R1 1776Perplexity	95.4
41	Llama 3.1 Nemotron Nano 8B V1NVIDIA	95.4
42	Llama 3.1 Nemotron Nano 4B v1.1NVIDIA	94.7
43	Phi 4 Mini ReasoningMicrosoft	94.6
44	DeepSeek R1 Distill Llama 70BDeepSeek	94.5
45	Gemini 2.0 Flash ThinkingGoogle	94.4
46	DeepSeek R1 Distill Qwen 32BDeepSeek	94.3
47	Qwen3 Coder 480B A35B InstructAlibaba	94.2
48	DeepSeek-V3 0324DeepSeek	94
49	DeepSeek R1 Distill Qwen 14BDeepSeek	93.9
50	Qwen3 4BAlibaba	93.3
51	DeepSeek R1 0528 Qwen3 8BDeepSeek	93.2
52	ERNIE 4.5 300B A47BBaidu	93.1
53	Qwen3 235B A22BAlibaba	93
54	Gemini 2.0 FlashGoogle	93
55	DeepSeek R1 Distill Qwen 7BDeepSeek	92.8
56	GPT-4.1 MiniOpenAI	92.5
57	o1-previewOpenAI	92.4
58	Gemini 2.0 ProGoogle	92.3
59	Sonar ReasoningPerplexity	92.1
60	Magistral Medium 1Mistral AI	91.7
61	GPT-4.1OpenAI	91.3
62	Mistral Medium 3Mistral AI	90.7
63	QwQ-32B-PreviewAlibaba	90.6
64	QwQ-32BAlibaba	90.6
65	Qwen3 8BAlibaba	90.4
66	DeepSeek-V3DeepSeek	90.2
67	o1-miniOpenAI	90
68	Qwen3 1.7BAlibaba	89.4
69	Reka Flash 3Reka AI	89.3
70	Qwen3 Coder 30B A3B InstructAlibaba	89.3
71	GPT-4oOpenAI	89.3
72	DeepSeek R1 Distill Llama 8BDeepSeek	89.1
73	Llama 4 MaverickMeta	88.9
74	Mistral Small 3.2Mistral AI	88.3
75	Gemma 3 27B InstructGoogle	88.3
76	Gemini 1.5 ProGoogle	87.6
77	Gemini 2.0 Flash LiteGoogle	87.3
78	Grok-3xAI	87
79	Qwen2.5 72B InstructAlibaba	85.8
80	Gemma 3 12B InstructGoogle	85.3
81	GPT-4.1 NanoOpenAI	84.8
82	Sarvam MSarvam	84.7
83	Llama 4 ScoutMeta	84.4
84	Nova PremierAmazon	83.9
85	DeepSeek R1 Distill Qwen 1.5BDeepSeek	83.9
86	Qwen2.5 MaxAlibaba	83.5
87	Gemini 1.5 FlashGoogle	82.7
88	Command ACohere	81.9
89	SonarPerplexity	81.7
90	Phi 4Microsoft	81
91	Qwen2.5 TurboAlibaba	80.5
92	Qwen2.5 32B InstructAlibaba	80.5
93	GPT-4o-miniOpenAI	78.9
94	Nova ProAmazon	78.6
95	Llama 3.1 Tulu3 405BAllen Institute for AI	77.8
96	Grok-2xAI	77.8
97	Llama 3.3 70B InstructMeta	77.3
98	Gemma 3n E4B InstructGoogle	77.1
99	Claude 3.5 SonnetAnthropic	77.1
100	Qwen2.5 Coder 32B InstructAlibaba	76.7
101	Gemma 3 4B InstructGoogle	76.6
102	Nova LiteAmazon	76.5
103	DeepSeek-V2.5DeepSeek	76.3
104	Qwen3 0.6BAlibaba	75
105	Sonar ProPerplexity	74.5
106	DeepSeek-Coder-V2DeepSeek	74.3
107	GrokxAI	73.7
108	GPT-4 TurboOpenAI	73.7
109	Mistral Large 2Mistral AI	73.6
110	Llama 3.1 Nemotron 70B InstructNVIDIA	73.3
111	Claude 3.5 HaikuAnthropic	72.1
112	Mistral Small 3Mistral AI	71.5
113	Pixtral LargeMistral AI	71.4
114	Mistral Small 3.1Mistral AI	70.7
115	Devstral MediumMistral AI	70.7
116	Nova MicroAmazon	70.3
117	Llama 3.1 405B InstructMeta	70.3
118	Qwen2 72B InstructAlibaba	70.1
119	Phi 4 Mini InstructMicrosoft	69.6
120	Phi-4-multimodal-instructMicrosoft	69.3
121	Gemma 3n E2B InstructGoogle	69.1
122	Granite 3.3 8B InstructIBM	69
123	Granite 3.3 8B BaseIBM	69
124	Gemini 1.5 Flash 8BGoogle	68.9
125	Devstral SmallMistral AI	68.4
126	Mistral SabaMistral AI	67.7
127	Granite 3.3 8BIBM	66.5
128	Qwen2.5-Coder 7B InstructAlibaba	66
129	Llama 3.1 70B InstructMeta	64.9
130	Claude 3 OpusAnthropic	64.1
131	Llama 3.2 90B InstructMeta	62.9
132	Jamba 1.5 LargeAI21 Labs	60.6
133	Jamba Large 1.7AI21 Labs	60
134	DeepHermes 3 - Mistral 24BNous Research	59.5
135	Jamba 1.6 LargeAI21 Labs	58
136	Mistral SmallMistral AI	56.3
137	Mixtral 8x22B InstructMistral AI	54.5
138	Hermes 3 - Llama-3.1 70BNous Research	53.8
139	Reka FlashReka AI	52.9
140	Mistral LargeMistral AI	52.7
141	Llama 3.1 8B InstructMeta	51.9
142	Llama 3.2 11B InstructMeta	51.6
143	Llama 3 8B InstructMeta	49.9
144	Llama 3.2 3B InstructMeta	48.9
145	Gemma 3 1B InstructGoogle	48.4
146	Llama 3 70B InstructMeta	48.3
147	LFM 40BLiquid AI	48
148	Phi-3 Mini Instruct 3.8BMicrosoft	45.7
149	GPT-3.5 TurboOpenAI	44.1
150	Claude 3 SonnetAnthropic	41.4
151	Mistral MediumMistral AI	40.5
152	Gemini 1.0 ProGoogle	40.3
153	Claude 3 HaikuAnthropic	39.4
154	Claude 2.1Anthropic	37.4
155	Jamba 1.5 MiniAI21 Labs	35.7
156	Solar MiniUpstage	33.1
157	Llama 2 Chat 13BMeta	32.9
158	Llama 2 Chat 70BMeta	32.3
159	OpenChat 3.5OpenChat	30.7
160	Mixtral 8x7B InstructMistral AI	29.9
161	DBRX InstructDatabricks	27.9
162	Command R+Cohere	27.9
163	Claude InstantAnthropic	26.4
164	Jamba 1.7 MiniAI21 Labs	25.8
165	Jamba 1.6 MiniAI21 Labs	25.7
166	DeepHermes 3 - Llama-3.1 8BNous Research	21.8
167	Llama 3.2 1B InstructMeta	14
168	Mistral 7B InstructMistral AI	12.1
169	Llama 2 Chat 7BMeta	5.9

Related Math benchmarks

AIME 2025221 MATH67 AIME 202446 GSM8K45 MGSM29 HMMT 202511