Math

MATH

Competition mathematics problems requiring multi-step symbolic reasoning.

Source

67Models

97.9Top score

70.6Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	o3-miniOpenAI	97.9
2	o1OpenAI	96.4
3	Gemini 2.5 ProGoogle	92
4	Gemini 2.0 FlashGoogle	89.7
5	Kimi K2 0905Moonshot AI	89.1
6	Gemma 3 27BGoogle	89
7	GPT-4.1OpenAI	87
8	Gemini 2.0 Flash LiteGoogle	86.8
9	Gemini 1.5 ProGoogle	86.5
10	o1-previewOpenAI	85.5
11	GPT-4.5OpenAI	85
12	GPT-5OpenAI	84.7
13	Gemma 3 12BGoogle	83.8
14	Qwen2.5 32B InstructAlibaba	83.1
15	Qwen2.5 72B InstructAlibaba	83.1
16	Qwen2.5 VL 32B InstructAlibaba	82.2
17	Claude 3.7 SonnetAnthropic	82
18	Phi 4Microsoft	80.4
19	Qwen2.5 14B InstructAlibaba	80
20	Claude 3.5 SonnetAnthropic	78.3
21	Gemini 1.5 FlashGoogle	77.9
22	Llama 3.3 70B InstructMeta	77
23	Nova ProAmazon	76.6
24	Grok-2xAI	76.1
25	Gemma 3 4BGoogle	75.6
26	Qwen2.5 7B InstructAlibaba	75.5
27	DeepSeek-V2.5DeepSeek	74.7
28	Llama 3.1 405B InstructMeta	73.8
29	Nova LiteAmazon	73.3
30	Grok-2 minixAI	73
31	GPT-4 TurboOpenAI	72.6
32	Qwen3 235B A22BAlibaba	71.8
33	Qwen2.5-Omni-7BAlibaba	71.5
34	Mistral Small 3 24B InstructMistral AI	70.6
35	Kimi K2 BaseMoonshot AI	70.2
36	GPT-4o-miniOpenAI	70.2
37	Mistral Small 3.2 24B InstructMistral AI	69.4
38	Claude 3.5 HaikuAnthropic	69.4
39	Nova MicroAmazon	69.3
40	Mistral Small 3.1 24B InstructMistral AI	69.3
41	Llama 3.2 90B InstructMeta	68
42	Phi 4 MiniMicrosoft	64
43	Llama 4 MaverickMeta	61.2
44	Claude 3 OpusAnthropic	60.1
45	Qwen2 72B InstructAlibaba	59.7
46	Phi-3.5-MoE-instructMicrosoft	59.5
47	Gemini 1.5 Flash 8BGoogle	58.7
48	Qwen2.5 Coder 32B InstructAlibaba	57.2
49	Ministral 8B InstructMistral AI	54.5
50	Llama 3.2 11B InstructMeta	51.9
51	Grok-1.5xAI	50.6
52	Llama 4 ScoutMeta	50.3
53	Qwen2 7B InstructAlibaba	49.6
54	Phi-3.5-mini-instructMicrosoft	48.5
55	Pixtral-12BMistral AI	48.1
56	Gemma 3 1BGoogle	48
57	Llama 3.2 3B InstructMeta	48
58	Qwen2.5-Coder 7B InstructAlibaba	46.6
59	Mistral Small 3 24B BaseMistral AI	46
60	Claude 3 SonnetAnthropic	43.1
61	GPT-3.5 TurboOpenAI	43.1
62	Gemma 2 27BGoogle	42.3
63	GPT-4OpenAI	42
64	Mixtral 8x22BMistral AI	41.8
65	Claude 3 HaikuAnthropic	38.9
66	Gemma 2 9BGoogle	36.6
67	Gemini 1.0 ProGoogle	32.6

Related Math benchmarks

AIME 2025221 MATH-500169 AIME 202446 GSM8K45 MGSM29 HMMT 202511