General

MMLU-Pro

A harder, more robust MMLU with ten-way multiple choice and reasoning-heavy questions.

Source

292Models

89.8Top score

73.3Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Gemini 3 ProGoogle	89.8
2	Claude Opus 4.5Anthropic	89.5
3	Gemini 3 FlashGoogle	89
4	Claude Opus 4.1Anthropic	88
5	MiniMax M2.1MiniMax	87.5
6	DeepSeek-V4-ProDeepSeek	87.5
7	Claude Sonnet 4.5Anthropic	87.5
8	GPT-5.2OpenAI	87.4
9	Claude Opus 4Anthropic	87.3
10	GPT-5OpenAI	87.1
11	GPT-5.1OpenAI	87
12	Grok 4xAI	86.6
13	GPT-5 CodexOpenAI	86.5
14	DeepSeek V3.2 SpecialeDeepSeek	86.3
15	DeepSeek-V3.2DeepSeek	86.2
16	GPT-5.1-CodexOpenAI	86
17	Gemini 2.5 ProGoogle	86
18	GLM 4.7Zhipu AI	85.6
19	Doubao Seed CodeByteDance	85.4
20	Grok 4.1 FastxAI	85.4
21	o3OpenAI	85.3
22	DeepSeek V3.1 TerminusDeepSeek	85.1
23	DeepSeek-R1-0528DeepSeek	85
24	DeepSeek V3.2 ExpDeepSeek	85
25	Grok 4 FastxAI	85
26	Cogito v2.1Deep Cogito	84.9
27	Kimi K2 ThinkingMoonshot AI	84.8
28	GLM-4.5Zhipu AI	84.6
29	Qwen3-235B-A22B-Thinking-2507Alibaba	84.4
30	DeepSeek-R1DeepSeek	84.4
31	Qwen3 235B A22B 2507Alibaba	84.3
32	MiMo-V2-FlashXiaomi	84.3
33	Gemini 2.5 FlashGoogle	84.2
34	Claude Sonnet 4Anthropic	84.2
35	Qwen3 MaxAlibaba	84.1
36	o1OpenAI	84.1
37	K-EXAONELG AI Research	83.8
38	DeepSeek-V3.1DeepSeek	83.7
39	GPT-5 miniOpenAI	83.7
40	Claude 3.7 SonnetAnthropic	83.7
41	Qwen3 VL 235B A22BAlibaba	83.6
42	Gemini 2.5 FlashGoogle	83.2
43	o4-miniOpenAI	83.2
44	ERNIE 5.0 ThinkingBaidu	83
45	Nova 2.0 ProAmazon	83
46	Qwen3-235B-A22B-Instruct-2507Alibaba	83
47	Hermes 4 - Llama-3.1 405BNous Research	82.9
48	GLM-4.6Zhipu AI	82.9
49	Grok 3 mini ReasoningxAI	82.8
50	Qwen3 Next 80B A3B ThinkingAlibaba	82.7
51	Llama 3.1 Nemotron Ultra 253B v1NVIDIA	82.5
52	Kimi K2 0905Moonshot AI	82.5
53	Qwen3 Max ThinkingAlibaba	82.4
54	Qwen3-Next-80B-A3BAlibaba	82.4
55	Kimi K2Moonshot AI	82.4
56	Qwen3 VL 235B A22B InstructAlibaba	82.3
57	Ling-1TInclusionAI	82.2
58	INTELLECT-3Prime Intellect	82.2
59	GPT-5.1-Codex-MiniOpenAI	82
60	MiniMax-M2MiniMax	82
61	Qwen3 VL 32BAlibaba	81.8
62	EXAONE 4.0 32BLG AI Research	81.8
63	Nova 2 LiteAmazon	81.8
64	MiniMax M1 80kMiniMax	81.6
65	Seed-OSS-36B-InstructByteDance	81.5
66	Magistral Medium 1.2Mistral AI	81.5
67	Llama Nemotron Super 49B v1.5NVIDIA	81.4
68	GLM 4.5 AirZhipu AI	81.4
69	Mi:dm K 2.5 ProKorea Telecom	81.3
70	KAT-Coder-Pro V1Kuaishou	81.3
71	DeepSeek-V3 0324DeepSeek	81.2
72	Hermes 4 - Llama-3.1 70BNous Research	81.1
73	Kimi K2-Instruct-0905Moonshot AI	81.1
74	Kimi K2 InstructMoonshot AI	81.1
75	Nova 2.0 OmniAmazon	80.9
76	Gemini 2.5 Flash-LiteGoogle	80.8
77	MiniMax M1 40kMiniMax	80.8
78	gpt-oss-120bOpenAI	80.8
79	Qwen3 VL 30B A3BAlibaba	80.7
80	Mistral Large 3Mistral AI	80.7
81	Ring-1TInclusionAI	80.6
82	Qwen3 Next 80B A3B InstructAlibaba	80.6
83	GPT-4.1OpenAI	80.6
84	Qwen3 30B A3B 2507Alibaba	80.5
85	Gemini 2.0 ProGoogle	80.5
86	Solar Pro 2Upstage	80.5
87	Llama 4 MaverickMeta	80.5
88	o3-miniOpenAI	80.2
89	Claude Haiku 4.5Anthropic	80
90	Qwen3Alibaba	80
91	Grok-3xAI	80
92	GLM 4.6VZhipu AI	79.9
93	Gemini 2.0 Flash ThinkingGoogle	79.8
94	Qwen3 32BAlibaba	79.8
95	Motif-2-12.7B-ReasoningMotif Technologies	79.6
96	DeepSeek R1 Distill Llama 70BDeepSeek	79.5
97	NVIDIA Nemotron 3 Nano 30B A3BNVIDIA	79.4
98	Ring-flash-2.0InclusionAI	79.3
99	Grok Code Fast 1xAI	79.3
100	Qwen3 Omni 30B A3BAlibaba	79.2
101	Qwen3 VL 32B InstructAlibaba	79.1
102	Apriel-v1.6-15B-ThinkerServiceNow	79
103	Qwen3 Coder 480B A35B InstructAlibaba	78.8
104	GLM 4.5VZhipu AI	78.8
105	K2-V2MBZUAI Institute of Foundation Models	78.6
106	HyperCLOVA X SEED ThinkNaver	78.5
107	Llama-3.3 Nemotron Super 49B v1NVIDIA	78.5
108	GPT-4.1 MiniOpenAI	78.1
109	GPT-5 nanoOpenAI	78
110	Qwen3 30B A3B 2507 InstructAlibaba	77.7
111	Ling-flash-2.0InclusionAI	77.7
112	Qwen3 30B A3BAlibaba	77.7
113	ERNIE 4.5 300B A47BBaidu	77.6
114	Claude 3.5 SonnetAnthropic	77.6
115	Qwen3 14BAlibaba	77.4
116	Apriel-v1.5-15B-ThinkerServiceNow	77.3
117	Magistral Small 1.2Mistral AI	76.8
118	QwQ-32BAlibaba	76.4
119	Qwen3 VL 30B A3B InstructAlibaba	76.4
120	Gemini 2.0 FlashGoogle	76.4
121	Olmo 3.1 32B ThinkAllen Institute for AI	76.3
122	Qwen2.5 MaxAlibaba	76.2
123	Devstral 2Mistral AI	76.2
124	Phi 4 Reasoning PlusMicrosoft	76
125	Mistral Medium 3Mistral AI	76
126	NVIDIA Nemotron Nano 12B v2 VLNVIDIA	75.9
127	Gemini 2.5 Flash LiteGoogle	75.9
128	Olmo 3 32B ThinkAllen Institute for AI	75.9
129	DeepSeek-V3DeepSeek	75.9
130	Gemini 1.5 ProGoogle	75.8
131	Sonar ProPerplexity	75.5
132	Grok-2xAI	75.5
133	Magistral Medium 1Mistral AI	75.3
134	Qwen3 VL 8BAlibaba	74.9
135	gpt-oss-20bOpenAI	74.8
136	GPT-4oOpenAI	74.7
137	Magistral Small 1Mistral AI	74.6
138	Qwen3 4B 2507Alibaba	74.3
139	Phi 4 ReasoningMicrosoft	74.3
140	Qwen3 8BAlibaba	74.3
141	Llama 4 ScoutMeta	74.3
142	NVIDIA Nemotron Nano 9B V2NVIDIA	74.2
143	o1-miniOpenAI	74.2
144	DeepSeek R1 Distill Qwen 14BDeepSeek	74
145	DeepSeek R1 0528 Qwen3 8BDeepSeek	73.9
146	DeepSeek R1 Distill Qwen 32BDeepSeek	73.9
147	Nova PremierAmazon	73.3
148	Llama 3.1 405B InstructMeta	73.3
149	Qwen3 Omni 30B A3B InstructAlibaba	72.5
150	Falcon-H1R-7BTII UAE	72.5
151	Grok-2 minixAI	72
152	Llama 3.1 Tulu3 405BAllen Institute for AI	71.6
153	Gemini 2.0 Flash LiteGoogle	71.6
154	Command ACohere	71.2
155	Qwen2.5 72B InstructAlibaba	71.1
156	Devstral MediumMistral AI	70.8
157	Qwen3 Coder 30B A3B InstructAlibaba	70.6
158	Phi 4Microsoft	70.4
159	GrokxAI	70.3
160	Pixtral LargeMistral AI	70.1
161	Qwen3 VL 4BAlibaba	70
162	Mistral Large 2Mistral AI	69.7
163	Qwen3 4BAlibaba	69.6
164	Sarvam MSarvam	69.6
165	GPT-4 TurboOpenAI	69.4
166	Ministral 3 14BMistral AI	69.3
167	Kimi K2 BaseMoonshot AI	69.2
168	Nova ProAmazon	69.1
169	Mistral Small 3.2 24B InstructMistral AI	69.1
170	Qwen2.5 32B InstructAlibaba	69
171	Llama 3.1 Nemotron 70B InstructNVIDIA	69
172	SonarPerplexity	68.9
173	Llama 3.3 70B InstructMeta	68.9
174	Qwen2.5 VL 32B InstructAlibaba	68.8
175	Qwen3 VL 8B InstructAlibaba	68.6
176	Claude 3 OpusAnthropic	68.5
177	Mistral Medium 3.1Mistral AI	68.3
178	Qwen3 235B A22BAlibaba	68.2
179	Mistral Small 3.2Mistral AI	68.1
180	Devstral Small 2Mistral AI	67.8
181	Gemma 3 27BGoogle	67.5
182	Gemini 1.5 FlashGoogle	67.3
183	Qwen3 4B 2507 InstructAlibaba	67.2
184	Ling-mini-2.0InclusionAI	67.1
185	Llama 3.2 90B InstructMeta	67.1
186	Gemma 3 27B InstructGoogle	66.9
187	Reka Flash 3Reka AI	66.9
188	Mistral Small 3.1 24B InstructMistral AI	66.8
189	Llama 3.1 70B InstructMeta	66.4
190	Mistral Small 3 24B InstructMistral AI	66.3
191	Mistral Small 3.1Mistral AI	65.9
192	GPT-4.1 NanoOpenAI	65.7
193	Olmo 3 7B ThinkAllen Institute for AI	65.5
194	Mistral Small 3Mistral AI	65.2
195	Claude 3.5 HaikuAnthropic	65
196	QwQ-32B-PreviewAlibaba	64.8
197	GPT-4o-miniOpenAI	64.8
198	Qwen2 72B InstructAlibaba	64.4
199	Ministral 3 8BMistral AI	64.2
200	Qwen2.5 14B InstructAlibaba	63.7
201	Qwen3 VL 4B InstructAlibaba	63.4
202	Qwen2.5 TurboAlibaba	63.3
203	Devstral SmallMistral AI	63.2
204	Granite 4.0 H SmallIBM	62.4
205	Mistral SabaMistral AI	61.1
206	Gemma 3 12BGoogle	60.6
207	Gemma 3 12B InstructGoogle	59.5
208	Nova LiteAmazon	59
209	Exaone 4.0 1.2BLG AI Research	58.8
210	Gemini 1.5 Flash 8BGoogle	58.7
211	Kimi Linear 48B A3B InstructMoonshot AI	58.5
212	DeepHermes 3 - Mistral 24BNous Research	58
213	Jamba Reasoning 3BAI21 Labs	57.7
214	Jamba Large 1.7AI21 Labs	57.7
215	Llama 3 70B InstructMeta	57.4
216	Hermes 3 - Llama-3.1 70BNous Research	57.1
217	Qwen3 1.7BAlibaba	57
218	Claude 3 SonnetAnthropic	56.8
219	Jamba 1.6 LargeAI21 Labs	56.5
220	Qwen2.5 7B InstructAlibaba	56.3
221	Mistral Small 3.1 24B BaseMistral AI	56
222	Llama 3.1 Nemotron Nano 4B v1.1NVIDIA	55.6
223	Mistral Small 3 24B BaseMistral AI	54.4
224	DeepSeek R1 Distill Llama 8BDeepSeek	54.3
225	Mixtral 8x22B InstructMistral AI	53.7
226	Jamba 1.5 LargeAI21 Labs	53.5
227	Nova MicroAmazon	53.1
228	Mistral SmallMistral AI	52.9
229	Phi 4 MiniMicrosoft	52.8
230	Ministral 3 3BMistral AI	52.4
231	Olmo 3 7B InstructAllen Institute for AI	52.2
232	Mistral LargeMistral AI	51.5
233	OLMo 2 32BAllen Institute for AI	51.1
234	Grok-1.5xAI	51
235	Gemma 3n E4B Instructed LiteRT PreviewGoogle	50.6
236	Gemma 3n E4B InstructedGoogle	50.6
237	LFM2 8B A1BLiquid AI	50.5
238	Qwen2.5 Coder 32B InstructAlibaba	50.4
239	Claude 2.1Anthropic	49.5
240	Mistral MediumMistral AI	49.1
241	Gemma 3n E4B InstructGoogle	48.8
242	Claude 2Anthropic	48.6
243	Phi-4-multimodal-instructMicrosoft	48.5
244	Llama 3.1 8B InstructMeta	48.3
245	Phi-3.5-mini-instructMicrosoft	47.4
246	Qwen2.5-Omni-7BAlibaba	47
247	Granite 3.3 8BIBM	46.8
248	Phi 4 Mini InstructMicrosoft	46.5
249	Llama 3.2 11B InstructMeta	46.4
250	GPT-3.5 TurboOpenAI	46.2
251	Phi-3.5-MoE-instructMicrosoft	45.3
252	Granite 4.0 MicroIBM	44.7
253	Qwen2 7B InstructAlibaba	44.1
254	Gemma 3 4BGoogle	43.6
255	Phi-3 Mini Instruct 3.8BMicrosoft	43.5
256	Claude InstantAnthropic	43.4
257	Command R+Cohere	43.2
258	Gemini 1.0 ProGoogle	43.1
259	DeepSeek Coder V2 Lite InstructDeepSeek	42.9
260	LFM 40BLiquid AI	42.5
261	Jamba 1.5 MiniAI21 Labs	42.5
262	Gemma 3 4B InstructGoogle	41.7
263	Llama 2 Chat 13BMeta	40.6
264	Llama 2 Chat 70BMeta	40.6
265	Gemma 3n E2B Instructed LiteRT (Preview)Google	40.5
266	Gemma 3n E2B InstructedGoogle	40.5
267	Llama 3 8B InstructMeta	40.5
268	Qwen2.5-Coder 7B InstructAlibaba	40.1
269	DBRX InstructDatabricks	39.7
270	Jamba 1.7 MiniAI21 Labs	38.8
271	Mixtral 8x7B InstructMistral AI	38.7
272	Gemma 3n E2B InstructGoogle	37.8
273	Molmo 7B-DAllen Institute for AI	37.1
274	Jamba 1.6 MiniAI21 Labs	36.7
275	DeepHermes 3 - Llama-3.1 8BNous Research	36.5
276	Qwen3 0.6BAlibaba	34.7
277	Llama 3.2 3B InstructMeta	34.7
278	Granite 4.0 1BIBM	32.5
279	OpenChat 3.5OpenChat	31
280	LFM2 2.6BLiquid AI	29.8
281	OLMo 2 7BAllen Institute for AI	28.2
282	Granite 4.0 H 1BIBM	27.7
283	DeepSeek R1 Distill Qwen 1.5BDeepSeek	26.9
284	LFM2 1.2BLiquid AI	25.7
285	Mistral 7B InstructMistral AI	24.5
286	Llama 3.2 1B InstructMeta	20
287	Llama 2 Chat 7BMeta	16.4
288	Gemma 3 1BGoogle	14.7
289	Gemma 3 1B InstructGoogle	13.5
290	Granite 4.0 H 350MIBM	12.7
291	Granite 4.0 350MIBM	12.4
292	Gemma 3 270MGoogle	5.5

Related General benchmarks

Humanity’s Last Exam360 MMLU92 IFEval41 SimpleQA26 Arena Hard21 LiveBench13