AI Hub
All benchmarks
Reasoning

GPQA Diamond

Graduate-level, Google-proof Q&A in biology, physics, and chemistry written by domain experts.

405Models
94.3Top score
65Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

100785533102023202420252026Claude Instant: 33 (2023-03-14)Claude 2: 34.4 (2023-07-11)Llama 2 Chat 70B: 32.7 (2023-07-18)Llama 2 Chat 13B: 32.1 (2023-07-18)Llama 2 Chat 7B: 22.7 (2023-07-18)Mistral 7B Instruct: 17.7 (2023-09-27)Claude 2.1: 31.9 (2023-11-21)Mistral Medium: 34.9 (2023-12-11)Mixtral 8x7B Instruct: 29.2 (2023-12-11)OpenChat 3.5: 23 (2023-12-18)Gemini 1.0 Pro: 27.9 (2024-02-15)Mistral Small: 38.1 (2024-02-26)Claude 3 Sonnet: 40.4 (2024-02-29)Claude 3 Opus: 50.4 (2024-03-04)Claude 3 Haiku: 33.3 (2024-03-13)Gemini 1.5 Flash 8B: 38.4 (2024-03-15)DBRX Instruct: 33.1 (2024-03-27)Grok-1.5: 35.9 (2024-03-28)Command R+: 32.3 (2024-04-04)Mixtral 8x22B Instruct: 33.2 (2024-04-17)Llama 3 70B Instruct: 37.9 (2024-04-18)Llama 3 8B Instruct: 29.6 (2024-04-18)Phi-3 Mini Instruct 3.8B: 31.9 (2024-04-23)Qwen1.5 Chat 110B: 28.9 (2024-04-25)Gemini 1.5 Flash: 51 (2024-05-01)DeepSeek Coder V2 Lite Instruct: 31.9 (2024-06-17)Claude 3.5 Sonnet: 67.2 (2024-06-20)GPT-4o-mini: 40.2 (2024-07-18)Llama 3.1 405B Instruct: 50.7 (2024-07-23)Qwen2 72B Instruct: 42.4 (2024-07-23)Llama 3.1 70B Instruct: 41.7 (2024-07-23)Llama 3.1 8B Instruct: 30.4 (2024-07-23)Qwen2 7B Instruct: 25.3 (2024-07-23)Mistral Large 2: 48.6 (2024-07-24)Grok-2: 56 (2024-08-13)Grok-2 mini: 51 (2024-08-13)Grok: 47.1 (2024-08-13)Hermes 3 - Llama-3.1 70B: 40.1 (2024-08-15)Jamba 1.5 Large: 36.9 (2024-08-22)Jamba 1.5 Mini: 32.3 (2024-08-22)Phi-3.5-MoE-instruct: 36.8 (2024-08-23)Phi-3.5-mini-instruct: 30.4 (2024-08-23)o1-mini: 60 (2024-09-12)Qwen2.5 32B Instruct: 49.5 (2024-09-19)Qwen2.5 72B Instruct: 49 (2024-09-19)Qwen2.5 14B Instruct: 45.5 (2024-09-19)Qwen2.5-Coder 7B Instruct: 33.9 (2024-09-19)Llama 3.2 90B Instruct: 46.7 (2024-09-25)Llama 3.2 11B Instruct: 32.8 (2024-09-25)Llama 3.2 3B Instruct: 32.8 (2024-09-25)Molmo 7B-D: 24 (2024-09-25)Llama 3.2 1B Instruct: 19.6 (2024-09-25)LFM 40B: 32.7 (2024-09-30)Llama 3.1 Nemotron 70B Instruct: 46.5 (2024-10-01)Qwen2.5 7B Instruct: 36.4 (2024-10-16)Claude 3.5 Haiku: 41.6 (2024-11-04)Qwen2.5 Coder 32B Instruct: 41.7 (2024-11-11)Qwen2.5 Turbo: 41 (2024-11-18)Pixtral Large: 50.5 (2024-11-19)Mistral Large: 35.1 (2024-11-19)Nova Pro: 46.9 (2024-11-20)Nova Lite: 42 (2024-11-20)Nova Micro: 40 (2024-11-20)OLMo 2 7B: 28.8 (2024-11-26)QwQ-32B-Preview: 65.2 (2024-11-28)Llama 3.3 70B Instruct: 50.5 (2024-12-06)Gemini 2.0 Flash: 62.1 (2024-12-11)DeepSeek-V3: 59.1 (2024-12-26)Phi 4: 56.1 (2025-01-10)DeepSeek R1 Zero: 73.3 (2025-01-20)DeepSeek-R1: 71.5 (2025-01-20)DeepSeek R1 Distill Llama 70B: 65.2 (2025-01-20)DeepSeek R1 Distill Qwen 32B: 62.1 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 59.1 (2025-01-20)DeepSeek R1 Distill Qwen 7B: 49.1 (2025-01-20)DeepSeek R1 Distill Llama 8B: 49 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 33.8 (2025-01-20)Gemini 2.0 Flash Thinking: 74.2 (2025-01-21)Sonar: 47.1 (2025-01-27)Sonar Reasoning: 62.3 (2025-01-28)Qwen2.5 Max: 58.7 (2025-01-28)Llama 3.1 Tulu3 405B: 51.6 (2025-01-30)Mistral Small 3: 46.2 (2025-01-30)Mistral Small 3 24B Instruct: 45.3 (2025-01-30)Mistral Small 3 24B Base: 34.4 (2025-01-30)o3-mini: 77.2 (2025-01-31)Phi-4-multimodal-instruct: 31.5 (2025-02-01)Phi 4 Mini: 25.2 (2025-02-01)Gemini 2.0 Pro: 62.2 (2025-02-05)DeepHermes 3 - Llama-3.1 8B: 27 (2025-02-13)Grok-3 Mini: 84 (2025-02-17)Mistral Saba: 42.4 (2025-02-17)Grok 3 mini Reasoning: 79.1 (2025-02-19)Gemini 2.0 Flash Lite: 51.5 (2025-02-25)GPT-4.5: 71.4 (2025-02-27)Qwen2.5 VL 32B Instruct: 46 (2025-02-28)QwQ-32B: 65.2 (2025-03-05)Jamba 1.6 Large: 38.7 (2025-03-06)Jamba 1.6 Mini: 30 (2025-03-06)Sonar Pro: 57.8 (2025-03-07)Reka Flash 3: 52.9 (2025-03-12)Gemma 3 27B Instruct: 42.8 (2025-03-12)Gemma 3 27B: 42.4 (2025-03-12)Gemma 3 12B Instruct: 34.9 (2025-03-12)Gemma 3 4B Instruct: 29.1 (2025-03-12)Gemma 3 1B: 19.2 (2025-03-12)Command A: 76.1 (2025-03-13)Gemma 3 12B: 40.9 (2025-03-13)DeepHermes 3 - Mistral 24B: 38.2 (2025-03-13)OLMo 2 32B: 32.8 (2025-03-13)Gemma 3 4B: 30.8 (2025-03-13)Gemma 3 1B Instruct: 23.7 (2025-03-13)Mistral Small 3.1 24B Instruct: 46 (2025-03-17)Mistral Small 3.1: 45.4 (2025-03-17)Mistral Small 3.1 24B Base: 37.5 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 66.7 (2025-03-18)Llama 3.1 Nemotron Nano 8B V1: 54.1 (2025-03-18)o1-pro: 79 (2025-03-19)Gemini 2.5 Pro: 84 (2025-03-25)DeepSeek-V3 0324: 68.4 (2025-03-25)Qwen2.5-Omni-7B: 30.8 (2025-03-27)Llama 4 Maverick: 69.8 (2025-04-05)Llama 4 Scout: 57.2 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 76 (2025-04-07)GPT-4.1: 66.3 (2025-04-14)GPT-4.1 Mini: 65 (2025-04-14)GPT-4.1 Nano: 50.3 (2025-04-14)o4-mini: 81.4 (2025-04-16)Granite 3.3 8B: 33.8 (2025-04-16)Gemini 2.5 Flash: 82.8 (2025-04-17)Qwen3 32B: 66.8 (2025-04-28)Qwen3 30B A3B: 65.8 (2025-04-28)Qwen3: 65.8 (2025-04-28)Qwen3 14B: 60.4 (2025-04-28)Qwen3 8B: 58.9 (2025-04-28)Qwen3 4B: 52.2 (2025-04-28)Qwen3 235B A22B: 47.5 (2025-04-28)Qwen3 1.7B: 35.6 (2025-04-28)Qwen3 0.6B: 23.9 (2025-04-28)Phi 4 Reasoning Plus: 68.9 (2025-04-30)Phi 4 Reasoning: 65.8 (2025-04-30)Nova Premier: 56.9 (2025-04-30)Phi 4 Mini Reasoning: 52 (2025-04-30)Mistral Medium 3: 57.8 (2025-05-07)Solar Pro 2: 68.7 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 40.8 (2025-05-20)Gemini Diffusion: 40.4 (2025-05-20)Gemma 3n E4B Instruct: 29.6 (2025-05-20)Gemma 3n E2B Instructed LiteRT (Preview): 24.8 (2025-05-20)Gemma 3n E4B Instructed LiteRT Preview: 23.7 (2025-05-20)Devstral Small: 43.4 (2025-05-21)Claude Opus 4: 79.6 (2025-05-22)Claude Sonnet 4: 75.4 (2025-05-22)Sarvam M: 41.6 (2025-05-23)DeepSeek-R1-0528: 81 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 61.2 (2025-05-29)Gemini 2.5 Pro Preview 06-05: 86.4 (2025-06-05)o3 Pro: 84.5 (2025-06-10)Magistral Medium: 70.8 (2025-06-10)Magistral Small 2506: 68.2 (2025-06-10)Magistral Medium 1: 67.9 (2025-06-10)Magistral Small 1: 64.1 (2025-06-10)MiniMax M1 80k: 69.7 (2025-06-17)MiniMax M1 40k: 68.2 (2025-06-17)Mistral Small 3.2: 50.5 (2025-06-20)Mistral Small 3.2 24B Instruct: 46.1 (2025-06-20)Gemma 3n E2B Instructed: 24.8 (2025-06-26)Gemma 3n E4B Instructed: 23.7 (2025-06-26)Gemma 3n E2B Instruct: 22.9 (2025-06-26)ERNIE 4.5 300B A47B: 81.1 (2025-06-30)Jamba 1.7 Mini: 32.2 (2025-07-07)Grok 4: 87.5 (2025-07-09)Devstral Medium: 49.2 (2025-07-10)LFM2 1.2B: 22.8 (2025-07-10)Kimi K2: 76.6 (2025-07-11)Kimi K2 Instruct: 75.1 (2025-07-11)Kimi K2 Base: 48.1 (2025-07-11)EXAONE 4.0 32B: 73.9 (2025-07-15)Exaone 4.0 1.2B: 51.5 (2025-07-15)Qwen3-235B-A22B-Instruct-2507: 77.5 (2025-07-22)Gemini 2.5 Flash Lite: 64.6 (2025-07-22)Qwen3 Coder 480B A35B Instruct: 61.8 (2025-07-22)Qwen3-235B-A22B-Thinking-2507: 81.1 (2025-07-25)Qwen3 235B A22B 2507: 79 (2025-07-25)GLM 4.5 Air: 75 (2025-07-25)Llama Nemotron Super 49B v1.5: 74.8 (2025-07-25)GLM-4.5: 79.1 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 65.9 (2025-07-29)Qwen3 30B A3B 2507: 70.7 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 51.6 (2025-07-31)gpt-oss-120b: 80.9 (2025-08-05)Claude Opus 4.1: 80.9 (2025-08-05)gpt-oss-20b: 71.5 (2025-08-05)Qwen3 4B 2507: 66.7 (2025-08-06)Qwen3 4B 2507 Instruct: 51.7 (2025-08-06)GPT-5: 87.3 (2025-08-07)GPT-5 mini: 82.3 (2025-08-07)GPT-5 nano: 71.2 (2025-08-07)Jamba Large 1.7: 39 (2025-08-08)GLM 4.5V: 68.4 (2025-08-11)Mistral Medium 3.1: 58.8 (2025-08-13)Gemma 3 270M: 22.4 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 57 (2025-08-18)Seed-OSS-36B-Instruct: 72.6 (2025-08-20)DeepSeek-V3.1: 74.9 (2025-08-21)Hermes 4 - Llama-3.1 405B: 72.7 (2025-08-27)Hermes 4 - Llama-3.1 70B: 69.9 (2025-08-27)Grok Code Fast 1: 72.7 (2025-08-28)Apertus 70B Instruct: 27.2 (2025-09-02)Apertus 8B Instruct: 25.6 (2025-09-02)Kimi K2 0905: 75.8 (2025-09-05)Kimi K2-Instruct-0905: 75.1 (2025-09-05)Nemotron Nano 9B V2: 64 (2025-09-05)Gemini 2.5 Flash-Lite: 70.9 (2025-09-08)Ling-mini-2.0: 56.2 (2025-09-09)Qwen3-Next-80B-A3B: 75.9 (2025-09-10)Qwen3 Next 80B A3B Thinking: 77.2 (2025-09-11)Qwen3 Next 80B A3B Instruct: 72.9 (2025-09-11)Magistral Small 1.2: 66.3 (2025-09-17)Ling-flash-2.0: 65.7 (2025-09-17)Magistral Medium 1.2: 73.9 (2025-09-18)Grok 4 Fast: 85.7 (2025-09-19)Ring-flash-2.0: 72.5 (2025-09-19)DeepSeek V3.1 Terminus: 79.2 (2025-09-22)Qwen3 Omni 30B A3B: 72.6 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 62 (2025-09-22)Granite 4.0 H Small: 41.6 (2025-09-22)GPT-5 Codex: 83.7 (2025-09-23)Qwen3 VL 235B A22B: 77.2 (2025-09-23)Qwen3 Max: 76.4 (2025-09-23)Qwen3 VL 235B A22B Instruct: 71.2 (2025-09-23)LFM2 2.6B: 30.6 (2025-09-23)Gemini 2.5 Flash: 79.3 (2025-09-25)Claude Sonnet 4.5: 83.4 (2025-09-29)DeepSeek V3.2 Exp: 79.9 (2025-09-29)GLM-4.6: 81 (2025-09-30)Apriel-v1.5-15B-Thinker: 71.3 (2025-09-30)Qwen3 VL 30B A3B: 72 (2025-10-03)GPT-5 Pro: 88.4 (2025-10-06)Qwen3 VL 30B A3B Instruct: 69.5 (2025-10-06)LFM2 8B A1B: 34.4 (2025-10-07)Ling-1T: 71.9 (2025-10-08)Jamba Reasoning 3B: 33.3 (2025-10-08)Ring-1T: 77.4 (2025-10-13)Qwen3 VL 8B: 57.9 (2025-10-14)Qwen3 VL 4B: 49.4 (2025-10-14)Qwen3 VL 8B Instruct: 42.7 (2025-10-14)Qwen3 VL 4B Instruct: 37.1 (2025-10-14)Claude Haiku 4.5: 73 (2025-10-15)Phi 4 Mini Instruct: 33.1 (2025-10-17)Granite 4.0 Micro: 33.6 (2025-10-20)Qwen3 VL 32B: 73.3 (2025-10-21)Qwen3 VL 32B Instruct: 67.1 (2025-10-23)MiniMax-M2: 77.7 (2025-10-27)NVIDIA Nemotron Nano 12B v2 VL: 57.2 (2025-10-28)Granite 4.0 1B: 28.1 (2025-10-28)Granite 4.0 H 1B: 26.3 (2025-10-28)Granite 4.0 350M: 26.1 (2025-10-28)Granite 4.0 H 350M: 25.7 (2025-10-28)Kimi Linear 48B A3B Instruct: 41.2 (2025-10-30)Kimi K2 Thinking: 84.5 (2025-11-06)Doubao Seed Code: 76.4 (2025-11-11)KAT-Coder-Pro V1: 76.4 (2025-11-11)GPT-5.1: 88.1 (2025-11-12)GPT-5.1-Codex: 86 (2025-11-13)GPT-5.1-Codex-Mini: 81.3 (2025-11-13)ERNIE 5.0 Thinking: 77.7 (2025-11-13)Gemini 3 Pro: 91.9 (2025-11-18)Cogito v2.1: 76.8 (2025-11-18)Grok 4.1 Fast: 85.3 (2025-11-19)Olmo 3 7B Think: 51.6 (2025-11-20)Olmo 3 7B Instruct: 40 (2025-11-20)Olmo 3 32B Think: 61 (2025-11-21)Claude Opus 4.5: 87 (2025-11-24)Apriel-v1.6-15B-Thinker: 73.3 (2025-11-25)Nova 2.0 Omni: 76 (2025-11-26)Nova 2.0 Pro: 78.5 (2025-11-27)INTELLECT-3: 76.1 (2025-11-27)DeepSeek V3.2 Speciale: 87.1 (2025-12-01)DeepSeek-V3.2: 84 (2025-12-01)Nova 2 Lite: 81.1 (2025-12-02)Mistral Large 3: 68 (2025-12-02)Ministral 3 14B: 57.2 (2025-12-02)Ministral 3 8B: 47.1 (2025-12-02)Ministral 3 3B: 35.8 (2025-12-02)Motif-2-12.7B-Reasoning: 69.5 (2025-12-04)K2-V2: 68.1 (2025-12-05)GLM 4.6V: 71.9 (2025-12-08)Devstral 2: 59.4 (2025-12-09)Devstral Small 2: 53.2 (2025-12-09)GPT-5.2: 92.4 (2025-12-11)Mi:dm K 2.5 Pro: 72.2 (2025-12-11)Molmo2-8B: 42.5 (2025-12-11)Olmo 3.1 32B Think: 59.1 (2025-12-12)MiMo-V2-Flash: 84.6 (2025-12-14)NVIDIA Nemotron 3 Nano 30B A3B: 75.7 (2025-12-15)K2 Think V2: 71.3 (2025-12-15)Gemini 3 Flash: 90.4 (2025-12-17)Solar Open 100B: 65.7 (2025-12-17)GLM 4.7: 85.9 (2025-12-22)MiniMax M2.1: 83 (2025-12-23)HyperCLOVA X SEED Think: 61.5 (2025-12-26)K-EXAONE: 78.3 (2025-12-31)Falcon-H1R-7B: 66.1 (2026-01-04)LFM2.5-1.2B-Instruct: 32.6 (2026-01-05)LFM2.5-VL-1.6B: 28.9 (2026-01-05)Olmo 3.1 32B Instruct: 53.9 (2026-01-13)GPT-5.2-Codex: 89.9 (2026-01-14)GLM 4.7 Flash: 58.1 (2026-01-19)Step3 VL 10B: 69 (2026-01-20)LFM2.5-1.2B-Thinking: 33.9 (2026-01-20)Kimi K2.5: 87.9 (2026-01-27)Solar Pro 3: 72.4 (2026-01-27)LongCat Flash Lite: 63.6 (2026-01-28)Step 3.5 Flash: 83.1 (2026-01-29)Qwen3 Coder Next: 73.7 (2026-02-04)Claude Opus 4.6: 91.3 (2026-02-05)Qwen3 Max Thinking: 86.1 (2026-02-09)Tri-21B-Think: 60.1 (2026-02-10)GLM-5: 86 (2026-02-11)Nanbeige4.1-3B: 84.9 (2026-02-11)MiniMax M2.5: 84.8 (2026-02-12)Qwen3.5 397B A17B: 89.3 (2026-02-16)Claude Sonnet 4.6: 87.5 (2026-02-17)Tiny Aya Global: 30.5 (2026-02-17)GPT-5.3-Codex: 91.5 (2026-02-24)Qwen3.5-27B: 85.8 (2026-02-25)Qwen3.5-122B-A10B: 85.7 (2026-02-25)Qwen3.5-35B-A3B: 84.5 (2026-02-25)LFM2-24B-A2B: 47.4 (2026-02-25)Qwen3.5 4B: 77.1 (2026-03-02)Qwen3.5 2B: 45.6 (2026-03-02)Qwen3.5 0.8B: 23.6 (2026-03-02)Mercury 2: 77 (2026-03-04)GPT-5.4: 92 (2026-03-05)Sarvam 105B: 73.8 (2026-03-06)Sarvam 30B: 63.3 (2026-03-06)Grok 4.20 0309: 88.5 (2026-03-10)Qwen3.5-9B: 80.6 (2026-03-10)NVIDIA Nemotron 3 Super 120B A12B: 80 (2026-03-11)GLM 5 Turbo: 84.7 (2026-03-15)Mistral Small 4: 76.9 (2026-03-16)NVIDIA Nemotron 3 Nano 4B: 51.3 (2026-03-16)GPT-5.4 mini: 87.5 (2026-03-17)GPT-5.4 nano: 81.7 (2026-03-17)MiniMax M2.7: 87.4 (2026-03-18)MiMo-V2-Pro: 87 (2026-03-18)MiMo-V2-Omni: 82.8 (2026-03-18)Nemotron Cascade 2 30B A3B: 75.8 (2026-03-19)MiMo-V2-Omni-0327: 85.5 (2026-03-27)KAT-Coder-Pro V2: 85.5 (2026-03-27)Qwen3.5 Omni Plus: 82.6 (2026-03-30)Qwen3.5 Omni Flash: 74.2 (2026-03-30)GLM 5V Turbo: 80.9 (2026-04-01)Trinity Large Thinking: 75.2 (2026-04-01)Qwen3.6 Plus: 88.2 (2026-04-02)Gemma 4 31B: 85.7 (2026-04-02)Step 3.5 Flash 2603: 82.6 (2026-04-02)Gemma 4 E2B: 43.3 (2026-04-02)Gemma 4 26B A4B: 79.2 (2026-04-03)Gemma 4 E4B: 57.6 (2026-04-03)Grok 4.20 0309 v2: 91.1 (2026-04-07)GLM 5.1: 86.8 (2026-04-07)Muse Spark: 88.4 (2026-04-08)EXAONE 4.5 33B: 79.4 (2026-04-09)JT-MINI: 67.6 (2026-04-15)Claude Opus 4.7: 94.2 (2026-04-16)Kimi K2.6: 91.1 (2026-04-20)Ling-2.6-flash: 59.3 (2026-04-21)Hy3: 86.7 (2026-04-22)MiMo-V2.5-Pro: 86.6 (2026-04-22)MiMo-V2.5: 84.9 (2026-04-22)GPT-5.5: 93.5 (2026-04-23)Ling-2.6-1T: 75.2 (2026-04-23)DeepSeek-V4-Pro: 90.1 (2026-04-24)DeepSeek-V4-Flash: 89.4 (2026-04-24)Qwen3.6 Max: 88.8 (2026-04-27)Qwen3.6 27B: 84.2 (2026-04-27)Qwen3.6 35B A3B: 84.1 (2026-04-27)Granite 4.1 30B: 48.1 (2026-04-29)Nemotron 3 Nano Omni 30B A3B Reasoning: 46.9 (2026-04-29)Granite 4.1 3B: 31.4 (2026-04-29)Mistral Medium 3.5: 74.8 (2026-04-30)Granite 4.1 8B: 43.3 (2026-04-30)Grok 4.3: 90.1 (2026-05-06)Gemini 3.1 Flash Lite: 82.2 (2026-05-07)Ring-2.6-1T: 85.7 (2026-05-08)MiniCPM-V 4.6 1.3B: 30.5 (2026-05-11)JT-35B-Flash: 82.9 (2026-05-14)Gemini 3.5 Flash: 92.2 (2026-05-19)Qwen3.7 Max: 92.3 (2026-05-21)MiniCPM5-1B: 26.9 (2026-05-25)GPT-3.5 Turbo: 30.8 (2023-03-01)GPT-3.5 TurboGPT-4: 35.7 (2023-03-14)GPT-4GPT-4 Turbo: 48 (2023-11-06)GPT-4 TurboGemini 1.5 Pro: 59.1 (2024-02-15)Gemini 1.5 ProGPT-4o: 70.1 (2024-05-13)GPT-4oo1-preview: 73.3 (2024-09-12)o1-previewo1: 78 (2024-12-05)o1Grok-3: 84.6 (2025-02-17)Grok-3Claude 3.7 Sonnet: 84.8 (2025-02-24)Claude 3.7 Sonneto3: 87.7 (2025-04-16)o3Grok-4 Heavy: 88.4 (2025-07-09)Grok-4 HeavyGemini 3 Deep Think: 93.8 (2025-11-18)Gemini 3 Deep ThinkGemini 3.1 Pro: 94.3 (2026-02-19)Gemini 3.1 Pro

Ranking

1Gemini 3.1 Pro
94.3
2Claude Opus 4.7
94.2
3Gemini 3 Deep Think
93.8
4GPT-5.5
93.5
5GPT-5.2
92.4
6Qwen3.7 Max
92.3
7Gemini 3.5 Flash
92.2
8GPT-5.4
92
9Gemini 3 Pro
91.9
10GPT-5.3-Codex
91.5
11Claude Opus 4.6
91.3
12Grok 4.20 0309 v2
91.1
13Kimi K2.6
91.1
14Gemini 3 Flash
90.4
15DeepSeek-V4-Pro
90.1
16Grok 4.3
90.1
17GPT-5.2-Codex
89.9
18DeepSeek-V4-Flash
89.4
19Qwen3.5 397B A17B
89.3
20Qwen3.6 Max
88.8
21Grok 4.20 0309
88.5
22Muse Spark
88.4
23Grok-4 Heavy
88.4
24GPT-5 Pro
88.4
25Qwen3.6 Plus
88.2
26GPT-5.1
88.1
27Kimi K2.5
87.9
28o3
87.7
29Grok 4
87.5
30Claude Sonnet 4.6
87.5
31GPT-5.4 mini
87.5
32MiniMax M2.7
87.4
33GPT-5
87.3
34DeepSeek V3.2 Speciale
87.1
35MiMo-V2-Pro
87
36Claude Opus 4.5
87
37GLM 5.1
86.8
38Hy3
86.7
39MiMo-V2.5-Pro
86.6
40Gemini 2.5 Pro Preview 06-05
86.4
41Qwen3 Max Thinking
86.1
42GPT-5.1-Codex
86
43GLM-5
86
44GLM 4.7
85.9
45Qwen3.5-27B
85.8
46Qwen3.5-122B-A10B
85.7
47Gemma 4 31B
85.7
48Ring-2.6-1T
85.7
49Grok 4 Fast
85.7
50MiMo-V2-Omni-0327
85.5
51KAT-Coder-Pro V2
85.5
52Grok 4.1 Fast
85.3
53Nanbeige4.1-3B
84.9
54MiMo-V2.5
84.9
55MiniMax M2.5
84.8
56Claude 3.7 Sonnet
84.8
57GLM 5 Turbo
84.7
58MiMo-V2-Flash
84.6
59Grok-3
84.6
60o3 Pro
84.5
61Qwen3.5-35B-A3B
84.5
62Kimi K2 Thinking
84.5
63Qwen3.6 27B
84.2
64Qwen3.6 35B A3B
84.1
65Grok-3 Mini
84
66DeepSeek-V3.2
84
67Gemini 2.5 Pro
84
68GPT-5 Codex
83.7
69Claude Sonnet 4.5
83.4
70Step 3.5 Flash
83.1
71MiniMax M2.1
83
72JT-35B-Flash
82.9
73MiMo-V2-Omni
82.8
74Gemini 2.5 Flash
82.8
75Qwen3.5 Omni Plus
82.6
76Step 3.5 Flash 2603
82.6
77GPT-5 mini
82.3
78Gemini 3.1 Flash Lite
82.2
79GPT-5.4 nano
81.7
80o4-mini
81.4
81GPT-5.1-Codex-Mini
81.3
82Qwen3-235B-A22B-Thinking-2507
81.1
83ERNIE 4.5 300B A47B
81.1
84Nova 2 Lite
81.1
85DeepSeek-R1-0528
81
86GLM-4.6
81
87GLM 5V Turbo
80.9
88gpt-oss-120b
80.9
89Claude Opus 4.1
80.9
90Qwen3.5-9B
80.6
91NVIDIA Nemotron 3 Super 120B A12B
80
92DeepSeek V3.2 Exp
79.9
93Claude Opus 4
79.6
94EXAONE 4.5 33B
79.4
95Gemini 2.5 Flash
79.3
96DeepSeek V3.1 Terminus
79.2
97Gemma 4 26B A4B
79.2
98Grok 3 mini Reasoning
79.1
99GLM-4.5
79.1
100Qwen3 235B A22B 2507
79
101o1-pro
79
102Nova 2.0 Pro
78.5
103K-EXAONE
78.3
104o1
78
105ERNIE 5.0 Thinking
77.7
106MiniMax-M2
77.7
107Qwen3-235B-A22B-Instruct-2507
77.5
108Ring-1T
77.4
109Qwen3 VL 235B A22B
77.2
110Qwen3 Next 80B A3B Thinking
77.2
111o3-mini
77.2
112Qwen3.5 4B
77.1
113Mercury 2
77
114Mistral Small 4
76.9
115Cogito v2.1
76.8
116Kimi K2
76.6
117Doubao Seed Code
76.4
118KAT-Coder-Pro V1
76.4
119Qwen3 Max
76.4
120Command A
76.1
121INTELLECT-3
76.1
122Nova 2.0 Omni
76
123Llama 3.1 Nemotron Ultra 253B v1
76
124Qwen3-Next-80B-A3B
75.9
125Nemotron Cascade 2 30B A3B
75.8
126Kimi K2 0905
75.8
127NVIDIA Nemotron 3 Nano 30B A3B
75.7
128Claude Sonnet 4
75.4
129Trinity Large Thinking
75.2
130Ling-2.6-1T
75.2
131Kimi K2-Instruct-0905
75.1
132Kimi K2 Instruct
75.1
133GLM 4.5 Air
75
134DeepSeek-V3.1
74.9
135Llama Nemotron Super 49B v1.5
74.8
136Mistral Medium 3.5
74.8
137Qwen3.5 Omni Flash
74.2
138Gemini 2.0 Flash Thinking
74.2
139EXAONE 4.0 32B
73.9
140Magistral Medium 1.2
73.9
141Sarvam 105B
73.8
142Qwen3 Coder Next
73.7
143Qwen3 VL 32B
73.3
144Apriel-v1.6-15B-Thinker
73.3
145o1-preview
73.3
146DeepSeek R1 Zero
73.3
147Claude Haiku 4.5
73
148Qwen3 Next 80B A3B Instruct
72.9
149Hermes 4 - Llama-3.1 405B
72.7
150Grok Code Fast 1
72.7
151Seed-OSS-36B-Instruct
72.6
152Qwen3 Omni 30B A3B
72.6
153Ring-flash-2.0
72.5
154Solar Pro 3
72.4
155Mi:dm K 2.5 Pro
72.2
156Qwen3 VL 30B A3B
72
157Ling-1T
71.9
158GLM 4.6V
71.9
159gpt-oss-20b
71.5
160DeepSeek-R1
71.5
161GPT-4.5
71.4
162Apriel-v1.5-15B-Thinker
71.3
163K2 Think V2
71.3
164Qwen3 VL 235B A22B Instruct
71.2
165GPT-5 nano
71.2
166Gemini 2.5 Flash-Lite
70.9
167Magistral Medium
70.8
168Qwen3 30B A3B 2507
70.7
169GPT-4o
70.1
170Hermes 4 - Llama-3.1 70B
69.9
171Llama 4 Maverick
69.8
172MiniMax M1 80k
69.7
173Motif-2-12.7B-Reasoning
69.5
174Qwen3 VL 30B A3B Instruct
69.5
175Step3 VL 10B
69
176Phi 4 Reasoning Plus
68.9
177Solar Pro 2
68.7
178DeepSeek-V3 0324
68.4
179GLM 4.5V
68.4
180MiniMax M1 40k
68.2
181Magistral Small 2506
68.2
182K2-V2
68.1
183Mistral Large 3
68
184Magistral Medium 1
67.9
185JT-MINI
67.6
186Claude 3.5 Sonnet
67.2
187Qwen3 VL 32B Instruct
67.1
188Qwen3 32B
66.8
189Qwen3 4B 2507
66.7
190Llama-3.3 Nemotron Super 49B v1
66.7
191Magistral Small 1.2
66.3
192GPT-4.1
66.3
193Falcon-H1R-7B
66.1
194Qwen3 30B A3B 2507 Instruct
65.9
195Phi 4 Reasoning
65.8
196Qwen3 30B A3B
65.8
197Qwen3
65.8
198Ling-flash-2.0
65.7
199Solar Open 100B
65.7
200QwQ-32B-Preview
65.2
201QwQ-32B
65.2
202DeepSeek R1 Distill Llama 70B
65.2
203GPT-4.1 Mini
65
204Gemini 2.5 Flash Lite
64.6
205Magistral Small 1
64.1
206Nemotron Nano 9B V2
64
207LongCat Flash Lite
63.6
208Sarvam 30B
63.3
209Sonar Reasoning
62.3
210Gemini 2.0 Pro
62.2
211DeepSeek R1 Distill Qwen 32B
62.1
212Gemini 2.0 Flash
62.1
213Qwen3 Omni 30B A3B Instruct
62
214Qwen3 Coder 480B A35B Instruct
61.8
215HyperCLOVA X SEED Think
61.5
216DeepSeek R1 0528 Qwen3 8B
61.2
217Olmo 3 32B Think
61
218Qwen3 14B
60.4
219Tri-21B-Think
60.1
220o1-mini
60
221Devstral 2
59.4
222Ling-2.6-flash
59.3
223Olmo 3.1 32B Think
59.1
224DeepSeek R1 Distill Qwen 14B
59.1
225DeepSeek-V3
59.1
226Gemini 1.5 Pro
59.1
227Qwen3 8B
58.9
228Mistral Medium 3.1
58.8
229Qwen2.5 Max
58.7
230GLM 4.7 Flash
58.1
231Qwen3 VL 8B
57.9
232Sonar Pro
57.8
233Mistral Medium 3
57.8
234Gemma 4 E4B
57.6
235NVIDIA Nemotron Nano 12B v2 VL
57.2
236Ministral 3 14B
57.2
237Llama 4 Scout
57.2
238NVIDIA Nemotron Nano 9B V2
57
239Nova Premier
56.9
240Ling-mini-2.0
56.2
241Phi 4
56.1
242Grok-2
56
243Llama 3.1 Nemotron Nano 8B V1
54.1
244Olmo 3.1 32B Instruct
53.9
245Devstral Small 2
53.2
246Reka Flash 3
52.9
247Qwen3 4B
52.2
248Phi 4 Mini Reasoning
52
249Qwen3 4B 2507 Instruct
51.7
250Llama 3.1 Tulu3 405B
51.6
251Olmo 3 7B Think
51.6
252Qwen3 Coder 30B A3B Instruct
51.6
253Exaone 4.0 1.2B
51.5
254Gemini 2.0 Flash Lite
51.5
255NVIDIA Nemotron 3 Nano 4B
51.3
256Grok-2 mini
51
257Gemini 1.5 Flash
51
258Llama 3.1 405B Instruct
50.7
259Mistral Small 3.2
50.5
260Pixtral Large
50.5
261Llama 3.3 70B Instruct
50.5
262Claude 3 Opus
50.4
263GPT-4.1 Nano
50.3
264Qwen2.5 32B Instruct
49.5
265Qwen3 VL 4B
49.4
266Devstral Medium
49.2
267DeepSeek R1 Distill Qwen 7B
49.1
268DeepSeek R1 Distill Llama 8B
49
269Qwen2.5 72B Instruct
49
270Mistral Large 2
48.6
271Granite 4.1 30B
48.1
272Kimi K2 Base
48.1
273GPT-4 Turbo
48
274Qwen3 235B A22B
47.5
275LFM2-24B-A2B
47.4
276Grok
47.1
277Sonar
47.1
278Ministral 3 8B
47.1
279Nemotron 3 Nano Omni 30B A3B Reasoning
46.9
280Nova Pro
46.9
281Llama 3.2 90B Instruct
46.7
282Llama 3.1 Nemotron 70B Instruct
46.5
283Mistral Small 3
46.2
284Mistral Small 3.2 24B Instruct
46.1
285Qwen2.5 VL 32B Instruct
46
286Mistral Small 3.1 24B Instruct
46
287Qwen3.5 2B
45.6
288Qwen2.5 14B Instruct
45.5
289Mistral Small 3.1
45.4
290Mistral Small 3 24B Instruct
45.3
291Devstral Small
43.4
292Gemma 4 E2B
43.3
293Granite 4.1 8B
43.3
294Gemma 3 27B Instruct
42.8
295Qwen3 VL 8B Instruct
42.7
296Molmo2-8B
42.5
297Mistral Saba
42.4
298Qwen2 72B Instruct
42.4
299Gemma 3 27B
42.4
300Nova Lite
42
301Llama 3.1 70B Instruct
41.7
302Qwen2.5 Coder 32B Instruct
41.7
303Sarvam M
41.6
304Granite 4.0 H Small
41.6
305Claude 3.5 Haiku
41.6
306Kimi Linear 48B A3B Instruct
41.2
307Qwen2.5 Turbo
41
308Gemma 3 12B
40.9
309Llama 3.1 Nemotron Nano 4B v1.1
40.8
310Gemini Diffusion
40.4
311Claude 3 Sonnet
40.4
312GPT-4o-mini
40.2
313Hermes 3 - Llama-3.1 70B
40.1
314Olmo 3 7B Instruct
40
315Nova Micro
40
316Jamba Large 1.7
39
317Jamba 1.6 Large
38.7
318Gemini 1.5 Flash 8B
38.4
319DeepHermes 3 - Mistral 24B
38.2
320Mistral Small
38.1
321Llama 3 70B Instruct
37.9
322Mistral Small 3.1 24B Base
37.5
323Qwen3 VL 4B Instruct
37.1
324Jamba 1.5 Large
36.9
325Phi-3.5-MoE-instruct
36.8
326Qwen2.5 7B Instruct
36.4
327Grok-1.5
35.9
328Ministral 3 3B
35.8
329GPT-4
35.7
330Qwen3 1.7B
35.6
331Mistral Large
35.1
332Mistral Medium
34.9
333Gemma 3 12B Instruct
34.9
334LFM2 8B A1B
34.4
335Mistral Small 3 24B Base
34.4
336Claude 2
34.4
337LFM2.5-1.2B-Thinking
33.9
338Qwen2.5-Coder 7B Instruct
33.9
339Granite 3.3 8B
33.8
340DeepSeek R1 Distill Qwen 1.5B
33.8
341Granite 4.0 Micro
33.6
342Jamba Reasoning 3B
33.3
343Claude 3 Haiku
33.3
344Mixtral 8x22B Instruct
33.2
345DBRX Instruct
33.1
346Phi 4 Mini Instruct
33.1
347Claude Instant
33
348OLMo 2 32B
32.8
349Llama 3.2 11B Instruct
32.8
350Llama 3.2 3B Instruct
32.8
351LFM 40B
32.7
352Llama 2 Chat 70B
32.7
353LFM2.5-1.2B-Instruct
32.6
354Jamba 1.5 Mini
32.3
355Command R+
32.3
356Jamba 1.7 Mini
32.2
357Llama 2 Chat 13B
32.1
358Phi-3 Mini Instruct 3.8B
31.9
359DeepSeek Coder V2 Lite Instruct
31.9
360Claude 2.1
31.9
361Phi-4-multimodal-instruct
31.5
362Granite 4.1 3B
31.4
363Qwen2.5-Omni-7B
30.8
364Gemma 3 4B
30.8
365GPT-3.5 Turbo
30.8
366LFM2 2.6B
30.6
367Tiny Aya Global
30.5
368MiniCPM-V 4.6 1.3B
30.5
369Phi-3.5-mini-instruct
30.4
370Llama 3.1 8B Instruct
30.4
371Jamba 1.6 Mini
30
372Gemma 3n E4B Instruct
29.6
373Llama 3 8B Instruct
29.6
374Mixtral 8x7B Instruct
29.2
375Gemma 3 4B Instruct
29.1
376Qwen1.5 Chat 110B
28.9
377LFM2.5-VL-1.6B
28.9
378OLMo 2 7B
28.8
379Granite 4.0 1B
28.1
380Gemini 1.0 Pro
27.9
381Apertus 70B Instruct
27.2
382DeepHermes 3 - Llama-3.1 8B
27
383MiniCPM5-1B
26.9
384Granite 4.0 H 1B
26.3
385Granite 4.0 350M
26.1
386Granite 4.0 H 350M
25.7
387Apertus 8B Instruct
25.6
388Qwen2 7B Instruct
25.3
389Phi 4 Mini
25.2
390Gemma 3n E2B Instructed LiteRT (Preview)
24.8
391Gemma 3n E2B Instructed
24.8
392Molmo 7B-D
24
393Qwen3 0.6B
23.9
394Gemma 3 1B Instruct
23.7
395Gemma 3n E4B Instructed LiteRT Preview
23.7
396Gemma 3n E4B Instructed
23.7
397Qwen3.5 0.8B
23.6
398OpenChat 3.5
23
399Gemma 3n E2B Instruct
22.9
400LFM2 1.2B
22.8
401Llama 2 Chat 7B
22.7
402Gemma 3 270M
22.4
403Llama 3.2 1B Instruct
19.6
404Gemma 3 1B
19.2
405Mistral 7B Instruct
17.7

Related Reasoning benchmarks