AI Hub
All benchmarks
General

MMLU-Pro

A harder, more robust MMLU with ten-way multiple choice and reasoning-heavy questions.

292Models
89.8Top score
73.3Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

10075502502023202420252026Claude Instant: 43.4 (2023-03-14)Llama 2 Chat 13B: 40.6 (2023-07-18)Llama 2 Chat 70B: 40.6 (2023-07-18)Llama 2 Chat 7B: 16.4 (2023-07-18)Mistral 7B Instruct: 24.5 (2023-09-27)Claude 2.1: 49.5 (2023-11-21)Mistral Medium: 49.1 (2023-12-11)Mixtral 8x7B Instruct: 38.7 (2023-12-11)OpenChat 3.5: 31 (2023-12-18)Gemini 1.0 Pro: 43.1 (2024-02-15)Mistral Small: 52.9 (2024-02-26)Claude 3 Sonnet: 56.8 (2024-02-29)Claude 3 Opus: 68.5 (2024-03-04)Gemini 1.5 Flash 8B: 58.7 (2024-03-15)DBRX Instruct: 39.7 (2024-03-27)Grok-1.5: 51 (2024-03-28)Command R+: 43.2 (2024-04-04)Mixtral 8x22B Instruct: 53.7 (2024-04-17)Llama 3 70B Instruct: 57.4 (2024-04-18)Llama 3 8B Instruct: 40.5 (2024-04-18)Phi-3 Mini Instruct 3.8B: 43.5 (2024-04-23)Gemini 1.5 Flash: 67.3 (2024-05-01)GPT-4o: 74.7 (2024-05-13)DeepSeek Coder V2 Lite Instruct: 42.9 (2024-06-17)GPT-4o-mini: 64.8 (2024-07-18)Llama 3.1 405B Instruct: 73.3 (2024-07-23)Llama 3.1 70B Instruct: 66.4 (2024-07-23)Qwen2 72B Instruct: 64.4 (2024-07-23)Llama 3.1 8B Instruct: 48.3 (2024-07-23)Qwen2 7B Instruct: 44.1 (2024-07-23)Mistral Large 2: 69.7 (2024-07-24)Grok-2: 75.5 (2024-08-13)Grok-2 mini: 72 (2024-08-13)Grok: 70.3 (2024-08-13)Hermes 3 - Llama-3.1 70B: 57.1 (2024-08-15)Jamba 1.5 Large: 53.5 (2024-08-22)Jamba 1.5 Mini: 42.5 (2024-08-22)Phi-3.5-mini-instruct: 47.4 (2024-08-23)Phi-3.5-MoE-instruct: 45.3 (2024-08-23)o1-mini: 74.2 (2024-09-12)Qwen2.5 72B Instruct: 71.1 (2024-09-19)Qwen2.5 32B Instruct: 69 (2024-09-19)Qwen2.5 14B Instruct: 63.7 (2024-09-19)Qwen2.5-Coder 7B Instruct: 40.1 (2024-09-19)Llama 3.2 90B Instruct: 67.1 (2024-09-25)Llama 3.2 11B Instruct: 46.4 (2024-09-25)Molmo 7B-D: 37.1 (2024-09-25)Llama 3.2 3B Instruct: 34.7 (2024-09-25)Llama 3.2 1B Instruct: 20 (2024-09-25)LFM 40B: 42.5 (2024-09-30)Llama 3.1 Nemotron 70B Instruct: 69 (2024-10-01)Qwen2.5 7B Instruct: 56.3 (2024-10-16)Claude 3.5 Haiku: 65 (2024-11-04)Qwen2.5 Coder 32B Instruct: 50.4 (2024-11-11)Qwen2.5 Turbo: 63.3 (2024-11-18)Pixtral Large: 70.1 (2024-11-19)Mistral Large: 51.5 (2024-11-19)Nova Pro: 69.1 (2024-11-20)Nova Lite: 59 (2024-11-20)Nova Micro: 53.1 (2024-11-20)OLMo 2 7B: 28.2 (2024-11-26)QwQ-32B-Preview: 64.8 (2024-11-28)Llama 3.3 70B Instruct: 68.9 (2024-12-06)Gemini 2.0 Flash: 76.4 (2024-12-11)DeepSeek-V3: 75.9 (2024-12-26)Phi 4: 70.4 (2025-01-10)DeepSeek R1 Distill Llama 70B: 79.5 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 74 (2025-01-20)DeepSeek R1 Distill Qwen 32B: 73.9 (2025-01-20)DeepSeek R1 Distill Llama 8B: 54.3 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 26.9 (2025-01-20)Gemini 2.0 Flash Thinking: 79.8 (2025-01-21)Sonar: 68.9 (2025-01-27)Qwen2.5 Max: 76.2 (2025-01-28)Llama 3.1 Tulu3 405B: 71.6 (2025-01-30)Mistral Small 3 24B Instruct: 66.3 (2025-01-30)Mistral Small 3: 65.2 (2025-01-30)Mistral Small 3 24B Base: 54.4 (2025-01-30)o3-mini: 80.2 (2025-01-31)Phi 4 Mini: 52.8 (2025-02-01)Phi-4-multimodal-instruct: 48.5 (2025-02-01)Gemini 2.0 Pro: 80.5 (2025-02-05)DeepHermes 3 - Llama-3.1 8B: 36.5 (2025-02-13)Grok-3: 80 (2025-02-17)Mistral Saba: 61.1 (2025-02-17)Grok 3 mini Reasoning: 82.8 (2025-02-19)Claude 3.7 Sonnet: 83.7 (2025-02-24)Gemini 2.0 Flash Lite: 71.6 (2025-02-25)Qwen2.5 VL 32B Instruct: 68.8 (2025-02-28)QwQ-32B: 76.4 (2025-03-05)Jamba 1.6 Large: 56.5 (2025-03-06)Jamba 1.6 Mini: 36.7 (2025-03-06)Sonar Pro: 75.5 (2025-03-07)Gemma 3 27B: 67.5 (2025-03-12)Gemma 3 27B Instruct: 66.9 (2025-03-12)Reka Flash 3: 66.9 (2025-03-12)Gemma 3 12B Instruct: 59.5 (2025-03-12)Gemma 3 4B Instruct: 41.7 (2025-03-12)Gemma 3 1B: 14.7 (2025-03-12)Command A: 71.2 (2025-03-13)Gemma 3 12B: 60.6 (2025-03-13)DeepHermes 3 - Mistral 24B: 58 (2025-03-13)OLMo 2 32B: 51.1 (2025-03-13)Gemma 3 4B: 43.6 (2025-03-13)Gemma 3 1B Instruct: 13.5 (2025-03-13)Mistral Small 3.1 24B Instruct: 66.8 (2025-03-17)Mistral Small 3.1: 65.9 (2025-03-17)Mistral Small 3.1 24B Base: 56 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 78.5 (2025-03-18)DeepSeek-V3 0324: 81.2 (2025-03-25)Qwen2.5-Omni-7B: 47 (2025-03-27)Llama 4 Maverick: 80.5 (2025-04-05)Llama 4 Scout: 74.3 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 82.5 (2025-04-07)GPT-4.1: 80.6 (2025-04-14)GPT-4.1 Mini: 78.1 (2025-04-14)GPT-4.1 Nano: 65.7 (2025-04-14)o3: 85.3 (2025-04-16)o4-mini: 83.2 (2025-04-16)Granite 3.3 8B: 46.8 (2025-04-16)Gemini 2.5 Flash: 83.2 (2025-04-17)Qwen3: 80 (2025-04-28)Qwen3 32B: 79.8 (2025-04-28)Qwen3 30B A3B: 77.7 (2025-04-28)Qwen3 14B: 77.4 (2025-04-28)Qwen3 8B: 74.3 (2025-04-28)Qwen3 4B: 69.6 (2025-04-28)Qwen3 235B A22B: 68.2 (2025-04-28)Qwen3 1.7B: 57 (2025-04-28)Qwen3 0.6B: 34.7 (2025-04-28)Phi 4 Reasoning Plus: 76 (2025-04-30)Phi 4 Reasoning: 74.3 (2025-04-30)Nova Premier: 73.3 (2025-04-30)Mistral Medium 3: 76 (2025-05-07)Solar Pro 2: 80.5 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 55.6 (2025-05-20)Gemma 3n E4B Instructed LiteRT Preview: 50.6 (2025-05-20)Gemma 3n E4B Instruct: 48.8 (2025-05-20)Gemma 3n E2B Instructed LiteRT (Preview): 40.5 (2025-05-20)Devstral Small: 63.2 (2025-05-21)Claude Sonnet 4: 84.2 (2025-05-22)Sarvam M: 69.6 (2025-05-23)DeepSeek-R1-0528: 85 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 73.9 (2025-05-29)Magistral Medium 1: 75.3 (2025-06-10)Magistral Small 1: 74.6 (2025-06-10)MiniMax M1 80k: 81.6 (2025-06-17)MiniMax M1 40k: 80.8 (2025-06-17)Mistral Small 3.2 24B Instruct: 69.1 (2025-06-20)Mistral Small 3.2: 68.1 (2025-06-20)Gemma 3n E4B Instructed: 50.6 (2025-06-26)Gemma 3n E2B Instructed: 40.5 (2025-06-26)Gemma 3n E2B Instruct: 37.8 (2025-06-26)ERNIE 4.5 300B A47B: 77.6 (2025-06-30)Jamba 1.7 Mini: 38.8 (2025-07-07)Grok 4: 86.6 (2025-07-09)Devstral Medium: 70.8 (2025-07-10)LFM2 1.2B: 25.7 (2025-07-10)Kimi K2: 82.4 (2025-07-11)Kimi K2 Instruct: 81.1 (2025-07-11)Kimi K2 Base: 69.2 (2025-07-11)EXAONE 4.0 32B: 81.8 (2025-07-15)Exaone 4.0 1.2B: 58.8 (2025-07-15)Qwen3-235B-A22B-Instruct-2507: 83 (2025-07-22)Qwen3 Coder 480B A35B Instruct: 78.8 (2025-07-22)Gemini 2.5 Flash Lite: 75.9 (2025-07-22)Qwen3-235B-A22B-Thinking-2507: 84.4 (2025-07-25)Qwen3 235B A22B 2507: 84.3 (2025-07-25)Llama Nemotron Super 49B v1.5: 81.4 (2025-07-25)GLM 4.5 Air: 81.4 (2025-07-25)GLM-4.5: 84.6 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 77.7 (2025-07-29)Qwen3 30B A3B 2507: 80.5 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 70.6 (2025-07-31)gpt-oss-120b: 80.8 (2025-08-05)gpt-oss-20b: 74.8 (2025-08-05)Qwen3 4B 2507: 74.3 (2025-08-06)Qwen3 4B 2507 Instruct: 67.2 (2025-08-06)GPT-5: 87.1 (2025-08-07)GPT-5 mini: 83.7 (2025-08-07)GPT-5 nano: 78 (2025-08-07)Jamba Large 1.7: 57.7 (2025-08-08)GLM 4.5V: 78.8 (2025-08-11)Mistral Medium 3.1: 68.3 (2025-08-13)Gemma 3 270M: 5.5 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 74.2 (2025-08-18)Seed-OSS-36B-Instruct: 81.5 (2025-08-20)DeepSeek-V3.1: 83.7 (2025-08-21)Hermes 4 - Llama-3.1 405B: 82.9 (2025-08-27)Hermes 4 - Llama-3.1 70B: 81.1 (2025-08-27)Grok Code Fast 1: 79.3 (2025-08-28)Kimi K2 0905: 82.5 (2025-09-05)Kimi K2-Instruct-0905: 81.1 (2025-09-05)Gemini 2.5 Flash-Lite: 80.8 (2025-09-08)Ling-mini-2.0: 67.1 (2025-09-09)Qwen3-Next-80B-A3B: 82.4 (2025-09-10)Qwen3 Next 80B A3B Thinking: 82.7 (2025-09-11)Qwen3 Next 80B A3B Instruct: 80.6 (2025-09-11)Ling-flash-2.0: 77.7 (2025-09-17)Magistral Small 1.2: 76.8 (2025-09-17)Magistral Medium 1.2: 81.5 (2025-09-18)Grok 4 Fast: 85 (2025-09-19)Ring-flash-2.0: 79.3 (2025-09-19)DeepSeek V3.1 Terminus: 85.1 (2025-09-22)Qwen3 Omni 30B A3B: 79.2 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 72.5 (2025-09-22)Granite 4.0 H Small: 62.4 (2025-09-22)GPT-5 Codex: 86.5 (2025-09-23)Qwen3 Max: 84.1 (2025-09-23)Qwen3 VL 235B A22B: 83.6 (2025-09-23)Qwen3 VL 235B A22B Instruct: 82.3 (2025-09-23)LFM2 2.6B: 29.8 (2025-09-23)Gemini 2.5 Flash: 84.2 (2025-09-25)Claude Sonnet 4.5: 87.5 (2025-09-29)DeepSeek V3.2 Exp: 85 (2025-09-29)GLM-4.6: 82.9 (2025-09-30)Apriel-v1.5-15B-Thinker: 77.3 (2025-09-30)Qwen3 VL 30B A3B: 80.7 (2025-10-03)Qwen3 VL 30B A3B Instruct: 76.4 (2025-10-06)LFM2 8B A1B: 50.5 (2025-10-07)Ling-1T: 82.2 (2025-10-08)Jamba Reasoning 3B: 57.7 (2025-10-08)Ring-1T: 80.6 (2025-10-13)Qwen3 VL 8B: 74.9 (2025-10-14)Qwen3 VL 4B: 70 (2025-10-14)Qwen3 VL 8B Instruct: 68.6 (2025-10-14)Qwen3 VL 4B Instruct: 63.4 (2025-10-14)Claude Haiku 4.5: 80 (2025-10-15)Phi 4 Mini Instruct: 46.5 (2025-10-17)Granite 4.0 Micro: 44.7 (2025-10-20)Qwen3 VL 32B: 81.8 (2025-10-21)Qwen3 VL 32B Instruct: 79.1 (2025-10-23)MiniMax-M2: 82 (2025-10-27)NVIDIA Nemotron Nano 12B v2 VL: 75.9 (2025-10-28)Granite 4.0 1B: 32.5 (2025-10-28)Granite 4.0 H 1B: 27.7 (2025-10-28)Granite 4.0 H 350M: 12.7 (2025-10-28)Granite 4.0 350M: 12.4 (2025-10-28)Kimi Linear 48B A3B Instruct: 58.5 (2025-10-30)Kimi K2 Thinking: 84.8 (2025-11-06)Doubao Seed Code: 85.4 (2025-11-11)KAT-Coder-Pro V1: 81.3 (2025-11-11)GPT-5.1: 87 (2025-11-12)GPT-5.1-Codex: 86 (2025-11-13)ERNIE 5.0 Thinking: 83 (2025-11-13)GPT-5.1-Codex-Mini: 82 (2025-11-13)Cogito v2.1: 84.9 (2025-11-18)Grok 4.1 Fast: 85.4 (2025-11-19)Olmo 3 7B Think: 65.5 (2025-11-20)Olmo 3 7B Instruct: 52.2 (2025-11-20)Olmo 3 32B Think: 75.9 (2025-11-21)Claude Opus 4.5: 89.5 (2025-11-24)Apriel-v1.6-15B-Thinker: 79 (2025-11-25)Nova 2.0 Omni: 80.9 (2025-11-26)Nova 2.0 Pro: 83 (2025-11-27)INTELLECT-3: 82.2 (2025-11-27)DeepSeek V3.2 Speciale: 86.3 (2025-12-01)DeepSeek-V3.2: 86.2 (2025-12-01)Nova 2 Lite: 81.8 (2025-12-02)Mistral Large 3: 80.7 (2025-12-02)Ministral 3 14B: 69.3 (2025-12-02)Ministral 3 8B: 64.2 (2025-12-02)Ministral 3 3B: 52.4 (2025-12-02)Motif-2-12.7B-Reasoning: 79.6 (2025-12-04)K2-V2: 78.6 (2025-12-05)GLM 4.6V: 79.9 (2025-12-08)Devstral 2: 76.2 (2025-12-09)Devstral Small 2: 67.8 (2025-12-09)GPT-5.2: 87.4 (2025-12-11)Mi:dm K 2.5 Pro: 81.3 (2025-12-11)Olmo 3.1 32B Think: 76.3 (2025-12-12)MiMo-V2-Flash: 84.3 (2025-12-14)NVIDIA Nemotron 3 Nano 30B A3B: 79.4 (2025-12-15)Gemini 3 Flash: 89 (2025-12-17)GLM 4.7: 85.6 (2025-12-22)MiniMax M2.1: 87.5 (2025-12-23)HyperCLOVA X SEED Think: 78.5 (2025-12-26)K-EXAONE: 83.8 (2025-12-31)Falcon-H1R-7B: 72.5 (2026-01-04)Qwen3 Max Thinking: 82.4 (2026-02-09)DeepSeek-V4-Pro: 87.5 (2026-04-24)GPT-3.5 Turbo: 46.2 (2023-03-01)GPT-3.5 TurboClaude 2: 48.6 (2023-07-11)Claude 2GPT-4 Turbo: 69.4 (2023-11-06)GPT-4 TurboGemini 1.5 Pro: 75.8 (2024-02-15)Gemini 1.5 ProClaude 3.5 Sonnet: 77.6 (2024-06-20)Claude 3.5 Sonneto1: 84.1 (2024-12-05)o1DeepSeek-R1: 84.4 (2025-01-20)DeepSeek-R1Gemini 2.5 Pro: 86 (2025-03-25)Gemini 2.5 ProClaude Opus 4: 87.3 (2025-05-22)Claude Opus 4Claude Opus 4.1: 88 (2025-08-05)Claude Opus 4.1Gemini 3 Pro: 89.8 (2025-11-18)Gemini 3 Pro

Ranking

1Gemini 3 Pro
89.8
2Claude Opus 4.5
89.5
3Gemini 3 Flash
89
4Claude Opus 4.1
88
5MiniMax M2.1
87.5
6DeepSeek-V4-Pro
87.5
7Claude Sonnet 4.5
87.5
8GPT-5.2
87.4
9Claude Opus 4
87.3
10GPT-5
87.1
11GPT-5.1
87
12Grok 4
86.6
13GPT-5 Codex
86.5
14DeepSeek V3.2 Speciale
86.3
15DeepSeek-V3.2
86.2
16GPT-5.1-Codex
86
17Gemini 2.5 Pro
86
18GLM 4.7
85.6
19Doubao Seed Code
85.4
20Grok 4.1 Fast
85.4
21o3
85.3
22DeepSeek V3.1 Terminus
85.1
23DeepSeek-R1-0528
85
24DeepSeek V3.2 Exp
85
25Grok 4 Fast
85
26Cogito v2.1
84.9
27Kimi K2 Thinking
84.8
28GLM-4.5
84.6
29Qwen3-235B-A22B-Thinking-2507
84.4
30DeepSeek-R1
84.4
31Qwen3 235B A22B 2507
84.3
32MiMo-V2-Flash
84.3
33Gemini 2.5 Flash
84.2
34Claude Sonnet 4
84.2
35Qwen3 Max
84.1
36o1
84.1
37K-EXAONE
83.8
38DeepSeek-V3.1
83.7
39GPT-5 mini
83.7
40Claude 3.7 Sonnet
83.7
41Qwen3 VL 235B A22B
83.6
42Gemini 2.5 Flash
83.2
43o4-mini
83.2
44ERNIE 5.0 Thinking
83
45Nova 2.0 Pro
83
46Qwen3-235B-A22B-Instruct-2507
83
47Hermes 4 - Llama-3.1 405B
82.9
48GLM-4.6
82.9
49Grok 3 mini Reasoning
82.8
50Qwen3 Next 80B A3B Thinking
82.7
51Llama 3.1 Nemotron Ultra 253B v1
82.5
52Kimi K2 0905
82.5
53Qwen3 Max Thinking
82.4
54Qwen3-Next-80B-A3B
82.4
55Kimi K2
82.4
56Qwen3 VL 235B A22B Instruct
82.3
57Ling-1T
82.2
58INTELLECT-3
82.2
59GPT-5.1-Codex-Mini
82
60MiniMax-M2
82
61Qwen3 VL 32B
81.8
62EXAONE 4.0 32B
81.8
63Nova 2 Lite
81.8
64MiniMax M1 80k
81.6
65Seed-OSS-36B-Instruct
81.5
66Magistral Medium 1.2
81.5
67Llama Nemotron Super 49B v1.5
81.4
68GLM 4.5 Air
81.4
69Mi:dm K 2.5 Pro
81.3
70KAT-Coder-Pro V1
81.3
71DeepSeek-V3 0324
81.2
72Hermes 4 - Llama-3.1 70B
81.1
73Kimi K2-Instruct-0905
81.1
74Kimi K2 Instruct
81.1
75Nova 2.0 Omni
80.9
76Gemini 2.5 Flash-Lite
80.8
77MiniMax M1 40k
80.8
78gpt-oss-120b
80.8
79Qwen3 VL 30B A3B
80.7
80Mistral Large 3
80.7
81Ring-1T
80.6
82Qwen3 Next 80B A3B Instruct
80.6
83GPT-4.1
80.6
84Qwen3 30B A3B 2507
80.5
85Gemini 2.0 Pro
80.5
86Solar Pro 2
80.5
87Llama 4 Maverick
80.5
88o3-mini
80.2
89Claude Haiku 4.5
80
90Qwen3
80
91Grok-3
80
92GLM 4.6V
79.9
93Gemini 2.0 Flash Thinking
79.8
94Qwen3 32B
79.8
95Motif-2-12.7B-Reasoning
79.6
96DeepSeek R1 Distill Llama 70B
79.5
97NVIDIA Nemotron 3 Nano 30B A3B
79.4
98Ring-flash-2.0
79.3
99Grok Code Fast 1
79.3
100Qwen3 Omni 30B A3B
79.2
101Qwen3 VL 32B Instruct
79.1
102Apriel-v1.6-15B-Thinker
79
103Qwen3 Coder 480B A35B Instruct
78.8
104GLM 4.5V
78.8
105K2-V2
78.6
106HyperCLOVA X SEED Think
78.5
107Llama-3.3 Nemotron Super 49B v1
78.5
108GPT-4.1 Mini
78.1
109GPT-5 nano
78
110Qwen3 30B A3B 2507 Instruct
77.7
111Ling-flash-2.0
77.7
112Qwen3 30B A3B
77.7
113ERNIE 4.5 300B A47B
77.6
114Claude 3.5 Sonnet
77.6
115Qwen3 14B
77.4
116Apriel-v1.5-15B-Thinker
77.3
117Magistral Small 1.2
76.8
118QwQ-32B
76.4
119Qwen3 VL 30B A3B Instruct
76.4
120Gemini 2.0 Flash
76.4
121Olmo 3.1 32B Think
76.3
122Qwen2.5 Max
76.2
123Devstral 2
76.2
124Phi 4 Reasoning Plus
76
125Mistral Medium 3
76
126NVIDIA Nemotron Nano 12B v2 VL
75.9
127Gemini 2.5 Flash Lite
75.9
128Olmo 3 32B Think
75.9
129DeepSeek-V3
75.9
130Gemini 1.5 Pro
75.8
131Sonar Pro
75.5
132Grok-2
75.5
133Magistral Medium 1
75.3
134Qwen3 VL 8B
74.9
135gpt-oss-20b
74.8
136GPT-4o
74.7
137Magistral Small 1
74.6
138Qwen3 4B 2507
74.3
139Phi 4 Reasoning
74.3
140Qwen3 8B
74.3
141Llama 4 Scout
74.3
142NVIDIA Nemotron Nano 9B V2
74.2
143o1-mini
74.2
144DeepSeek R1 Distill Qwen 14B
74
145DeepSeek R1 0528 Qwen3 8B
73.9
146DeepSeek R1 Distill Qwen 32B
73.9
147Nova Premier
73.3
148Llama 3.1 405B Instruct
73.3
149Qwen3 Omni 30B A3B Instruct
72.5
150Falcon-H1R-7B
72.5
151Grok-2 mini
72
152Llama 3.1 Tulu3 405B
71.6
153Gemini 2.0 Flash Lite
71.6
154Command A
71.2
155Qwen2.5 72B Instruct
71.1
156Devstral Medium
70.8
157Qwen3 Coder 30B A3B Instruct
70.6
158Phi 4
70.4
159Grok
70.3
160Pixtral Large
70.1
161Qwen3 VL 4B
70
162Mistral Large 2
69.7
163Qwen3 4B
69.6
164Sarvam M
69.6
165GPT-4 Turbo
69.4
166Ministral 3 14B
69.3
167Kimi K2 Base
69.2
168Nova Pro
69.1
169Mistral Small 3.2 24B Instruct
69.1
170Qwen2.5 32B Instruct
69
171Llama 3.1 Nemotron 70B Instruct
69
172Sonar
68.9
173Llama 3.3 70B Instruct
68.9
174Qwen2.5 VL 32B Instruct
68.8
175Qwen3 VL 8B Instruct
68.6
176Claude 3 Opus
68.5
177Mistral Medium 3.1
68.3
178Qwen3 235B A22B
68.2
179Mistral Small 3.2
68.1
180Devstral Small 2
67.8
181Gemma 3 27B
67.5
182Gemini 1.5 Flash
67.3
183Qwen3 4B 2507 Instruct
67.2
184Ling-mini-2.0
67.1
185Llama 3.2 90B Instruct
67.1
186Gemma 3 27B Instruct
66.9
187Reka Flash 3
66.9
188Mistral Small 3.1 24B Instruct
66.8
189Llama 3.1 70B Instruct
66.4
190Mistral Small 3 24B Instruct
66.3
191Mistral Small 3.1
65.9
192GPT-4.1 Nano
65.7
193Olmo 3 7B Think
65.5
194Mistral Small 3
65.2
195Claude 3.5 Haiku
65
196QwQ-32B-Preview
64.8
197GPT-4o-mini
64.8
198Qwen2 72B Instruct
64.4
199Ministral 3 8B
64.2
200Qwen2.5 14B Instruct
63.7
201Qwen3 VL 4B Instruct
63.4
202Qwen2.5 Turbo
63.3
203Devstral Small
63.2
204Granite 4.0 H Small
62.4
205Mistral Saba
61.1
206Gemma 3 12B
60.6
207Gemma 3 12B Instruct
59.5
208Nova Lite
59
209Exaone 4.0 1.2B
58.8
210Gemini 1.5 Flash 8B
58.7
211Kimi Linear 48B A3B Instruct
58.5
212DeepHermes 3 - Mistral 24B
58
213Jamba Reasoning 3B
57.7
214Jamba Large 1.7
57.7
215Llama 3 70B Instruct
57.4
216Hermes 3 - Llama-3.1 70B
57.1
217Qwen3 1.7B
57
218Claude 3 Sonnet
56.8
219Jamba 1.6 Large
56.5
220Qwen2.5 7B Instruct
56.3
221Mistral Small 3.1 24B Base
56
222Llama 3.1 Nemotron Nano 4B v1.1
55.6
223Mistral Small 3 24B Base
54.4
224DeepSeek R1 Distill Llama 8B
54.3
225Mixtral 8x22B Instruct
53.7
226Jamba 1.5 Large
53.5
227Nova Micro
53.1
228Mistral Small
52.9
229Phi 4 Mini
52.8
230Ministral 3 3B
52.4
231Olmo 3 7B Instruct
52.2
232Mistral Large
51.5
233OLMo 2 32B
51.1
234Grok-1.5
51
235Gemma 3n E4B Instructed LiteRT Preview
50.6
236Gemma 3n E4B Instructed
50.6
237LFM2 8B A1B
50.5
238Qwen2.5 Coder 32B Instruct
50.4
239Claude 2.1
49.5
240Mistral Medium
49.1
241Gemma 3n E4B Instruct
48.8
242Claude 2
48.6
243Phi-4-multimodal-instruct
48.5
244Llama 3.1 8B Instruct
48.3
245Phi-3.5-mini-instruct
47.4
246Qwen2.5-Omni-7B
47
247Granite 3.3 8B
46.8
248Phi 4 Mini Instruct
46.5
249Llama 3.2 11B Instruct
46.4
250GPT-3.5 Turbo
46.2
251Phi-3.5-MoE-instruct
45.3
252Granite 4.0 Micro
44.7
253Qwen2 7B Instruct
44.1
254Gemma 3 4B
43.6
255Phi-3 Mini Instruct 3.8B
43.5
256Claude Instant
43.4
257Command R+
43.2
258Gemini 1.0 Pro
43.1
259DeepSeek Coder V2 Lite Instruct
42.9
260LFM 40B
42.5
261Jamba 1.5 Mini
42.5
262Gemma 3 4B Instruct
41.7
263Llama 2 Chat 13B
40.6
264Llama 2 Chat 70B
40.6
265Gemma 3n E2B Instructed LiteRT (Preview)
40.5
266Gemma 3n E2B Instructed
40.5
267Llama 3 8B Instruct
40.5
268Qwen2.5-Coder 7B Instruct
40.1
269DBRX Instruct
39.7
270Jamba 1.7 Mini
38.8
271Mixtral 8x7B Instruct
38.7
272Gemma 3n E2B Instruct
37.8
273Molmo 7B-D
37.1
274Jamba 1.6 Mini
36.7
275DeepHermes 3 - Llama-3.1 8B
36.5
276Qwen3 0.6B
34.7
277Llama 3.2 3B Instruct
34.7
278Granite 4.0 1B
32.5
279OpenChat 3.5
31
280LFM2 2.6B
29.8
281OLMo 2 7B
28.2
282Granite 4.0 H 1B
27.7
283DeepSeek R1 Distill Qwen 1.5B
26.9
284LFM2 1.2B
25.7
285Mistral 7B Instruct
24.5
286Llama 3.2 1B Instruct
20
287Llama 2 Chat 7B
16.4
288Gemma 3 1B
14.7
289Gemma 3 1B Instruct
13.5
290Granite 4.0 H 350M
12.7
291Granite 4.0 350M
12.4
292Gemma 3 270M
5.5

Related General benchmarks