AI Hub
All benchmarks
General

Humanity’s Last Exam

A multi-modal benchmark at the frontier of human knowledge with 2,500 questions across dozens of subjects including mathematics, humanities, and natural sciences, created by nearly 1000 subject expert

360Models
50.7Top score
6.4Median

A multi-modal benchmark at the frontier of human knowledge with 2,500 questions across dozens of subjects including mathematics, humanities, and natural sciences, created by nearly 1000 subject expert contributors from over 500 institutions

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

6045301502023202420252026Llama 2 Chat 70B: 5 (2023-07-18)Llama 2 Chat 13B: 4.7 (2023-07-18)Mistral 7B Instruct: 4.3 (2023-09-27)GPT-4 Turbo: 3.3 (2023-11-06)Claude 2.1: 4.2 (2023-11-21)Mixtral 8x7B Instruct: 4.5 (2023-12-11)Mistral Medium: 3.4 (2023-12-11)OpenChat 3.5: 4.8 (2023-12-18)Gemini 1.5 Pro: 4.9 (2024-02-15)Gemini 1.0 Pro: 4.6 (2024-02-15)Mistral Small: 4.4 (2024-02-26)Claude 3 Sonnet: 3.8 (2024-02-29)Claude 3 Opus: 3.1 (2024-03-04)Claude 3 Haiku: 3.9 (2024-03-13)Gemini 1.5 Flash 8B: 4.5 (2024-03-15)Command R+: 4.8 (2024-04-04)Mixtral 8x22B Instruct: 4.1 (2024-04-17)Llama 3 8B Instruct: 5.1 (2024-04-18)Llama 3 70B Instruct: 4.4 (2024-04-18)Phi-3 Mini Instruct 3.8B: 4.4 (2024-04-23)Gemini 1.5 Flash: 4.2 (2024-05-01)GPT-4o: 5.3 (2024-05-13)DeepSeek Coder V2 Lite Instruct: 5.3 (2024-06-17)Claude 3.5 Sonnet: 3.9 (2024-06-20)GPT-4o-mini: 4 (2024-07-18)Llama 3.1 8B Instruct: 5.1 (2024-07-23)Llama 3.1 70B Instruct: 4.6 (2024-07-23)Llama 3.1 405B Instruct: 4.2 (2024-07-23)Qwen2 72B Instruct: 3.7 (2024-07-23)Mistral Large 2: 4 (2024-07-24)Grok: 4.7 (2024-08-13)Grok-2: 3.8 (2024-08-13)Hermes 3 - Llama-3.1 70B: 4.1 (2024-08-15)Jamba 1.5 Mini: 5.1 (2024-08-22)Jamba 1.5 Large: 4 (2024-08-22)o1-mini: 4.9 (2024-09-12)Qwen2.5-Coder 7B Instruct: 4.8 (2024-09-19)Qwen2.5 72B Instruct: 4.2 (2024-09-19)Qwen2.5 32B Instruct: 3.8 (2024-09-19)Llama 3.2 1B Instruct: 5.3 (2024-09-25)Llama 3.2 11B Instruct: 5.2 (2024-09-25)Llama 3.2 3B Instruct: 5.2 (2024-09-25)Molmo 7B-D: 5.1 (2024-09-25)Llama 3.2 90B Instruct: 4.9 (2024-09-25)LFM 40B: 4.9 (2024-09-30)Llama 3.1 Nemotron 70B Instruct: 4.6 (2024-10-01)Claude 3.5 Haiku: 3.5 (2024-11-04)Qwen2.5 Coder 32B Instruct: 3.8 (2024-11-11)Qwen2.5 Turbo: 4.2 (2024-11-18)Pixtral Large: 3.6 (2024-11-19)Mistral Large: 3.4 (2024-11-19)Nova Micro: 4.7 (2024-11-20)Nova Lite: 4.6 (2024-11-20)Nova Pro: 3.4 (2024-11-20)OLMo 2 7B: 5.5 (2024-11-26)QwQ-32B-Preview: 4.8 (2024-11-28)Llama 3.3 70B Instruct: 4 (2024-12-06)Gemini 2.0 Flash: 5.3 (2024-12-11)DeepSeek-V3: 3.6 (2024-12-26)Phi 4: 4.1 (2025-01-10)DeepSeek R1 Distill Llama 70B: 6.1 (2025-01-20)DeepSeek R1 Distill Qwen 32B: 5.5 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 4.4 (2025-01-20)DeepSeek R1 Distill Llama 8B: 4.2 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 3.3 (2025-01-20)Gemini 2.0 Flash Thinking: 7.1 (2025-01-21)Sonar: 7.3 (2025-01-27)Qwen2.5 Max: 4.5 (2025-01-28)Mistral Small 3: 4.1 (2025-01-30)Llama 3.1 Tulu3 405B: 3.5 (2025-01-30)Phi-4-multimodal-instruct: 4.4 (2025-02-01)Gemini 2.0 Pro: 6.8 (2025-02-05)DeepHermes 3 - Llama-3.1 8B: 4.3 (2025-02-13)Grok-3: 5.1 (2025-02-17)Mistral Saba: 4.1 (2025-02-17)Grok 3 mini Reasoning: 11.1 (2025-02-19)Claude 3.7 Sonnet: 10.3 (2025-02-24)Gemini 2.0 Flash Lite: 4.4 (2025-02-25)QwQ-32B: 8.2 (2025-03-05)Jamba 1.6 Mini: 4.6 (2025-03-06)Jamba 1.6 Large: 4 (2025-03-06)Sonar Pro: 7.9 (2025-03-07)Gemma 3 4B Instruct: 5.2 (2025-03-12)Reka Flash 3: 5.1 (2025-03-12)Gemma 3 12B Instruct: 4.8 (2025-03-12)Gemma 3 27B Instruct: 4.7 (2025-03-12)Command A: 11.4 (2025-03-13)Gemma 3 1B Instruct: 5.2 (2025-03-13)DeepHermes 3 - Mistral 24B: 3.9 (2025-03-13)OLMo 2 32B: 3.7 (2025-03-13)Mistral Small 3.1: 4.8 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 6.5 (2025-03-18)DeepSeek-V3 0324: 5.2 (2025-03-25)Llama 4 Maverick: 4.8 (2025-04-05)Llama 4 Scout: 4.3 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 8.1 (2025-04-07)GPT-4.1: 5.4 (2025-04-14)GPT-4.1 Nano: 3.9 (2025-04-14)GPT-4.1 Mini: 3.7 (2025-04-14)o4-mini: 14.7 (2025-04-16)Granite 3.3 8B: 4.2 (2025-04-16)Gemini 2.5 Flash: 11 (2025-04-17)Qwen3 235B A22B: 11.7 (2025-04-28)Qwen3 32B: 8.3 (2025-04-28)Qwen3 30B A3B: 6.6 (2025-04-28)Qwen3 0.6B: 5.7 (2025-04-28)Qwen3 1.7B: 5.2 (2025-04-28)Qwen3 4B: 5.1 (2025-04-28)Qwen3 14B: 4.3 (2025-04-28)Qwen3 8B: 4.2 (2025-04-28)Nova Premier: 4.7 (2025-04-30)Mistral Medium 3: 4.3 (2025-05-07)Solar Pro 2: 7 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 5.1 (2025-05-20)Gemma 3n E4B Instruct: 4.9 (2025-05-20)Devstral Small: 4 (2025-05-21)Claude Opus 4: 11.7 (2025-05-22)Claude Sonnet 4: 9.6 (2025-05-22)Sarvam M: 3.3 (2025-05-23)DeepSeek-R1-0528: 17.7 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 5.6 (2025-05-29)Gemini 2.5 Pro Preview 06-05: 21.6 (2025-06-05)Magistral Medium 1: 9.5 (2025-06-10)Magistral Medium: 9 (2025-06-10)Magistral Small 1: 7.2 (2025-06-10)MiniMax M1 80k: 8.2 (2025-06-17)MiniMax M1 40k: 7.5 (2025-06-17)Mistral Small 3.2: 4.3 (2025-06-20)Gemma 3n E2B Instruct: 4 (2025-06-26)ERNIE 4.5 300B A47B: 3.5 (2025-06-30)Jamba 1.7 Mini: 4.5 (2025-07-07)Grok 4: 40 (2025-07-09)LFM2 1.2B: 5.7 (2025-07-10)Devstral Medium: 3.8 (2025-07-10)Kimi K2: 7 (2025-07-11)Kimi K2 Instruct: 4.7 (2025-07-11)EXAONE 4.0 32B: 10.5 (2025-07-15)Exaone 4.0 1.2B: 5.8 (2025-07-15)Qwen3-235B-A22B-Instruct-2507: 10.6 (2025-07-22)Gemini 2.5 Flash Lite: 5.1 (2025-07-22)Qwen3 Coder 480B A35B Instruct: 4.4 (2025-07-22)Qwen3-235B-A22B-Thinking-2507: 18.2 (2025-07-25)Qwen3 235B A22B 2507: 15 (2025-07-25)GLM 4.5 Air: 10.6 (2025-07-25)Llama Nemotron Super 49B v1.5: 6.8 (2025-07-25)GLM-4.5: 14.4 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 6.8 (2025-07-29)Qwen3 30B A3B 2507: 9.8 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 4 (2025-07-31)gpt-oss-120b: 19 (2025-08-05)gpt-oss-20b: 17.3 (2025-08-05)Claude Opus 4.1: 11.9 (2025-08-05)Qwen3 4B 2507: 5.9 (2025-08-06)Qwen3 4B 2507 Instruct: 4.7 (2025-08-06)GPT-5: 24.8 (2025-08-07)GPT-5 mini: 16.7 (2025-08-07)GPT-5 nano: 8.7 (2025-08-07)Jamba Large 1.7: 3.8 (2025-08-08)GLM 4.5V: 5.9 (2025-08-11)Mistral Medium 3.1: 4.4 (2025-08-13)Gemma 3 270M: 4.2 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 4.6 (2025-08-18)Seed-OSS-36B-Instruct: 9.1 (2025-08-20)DeepSeek-V3.1: 15.9 (2025-08-21)Hermes 4 - Llama-3.1 405B: 10.3 (2025-08-27)Hermes 4 - Llama-3.1 70B: 7.9 (2025-08-27)Grok Code Fast 1: 7.5 (2025-08-28)Apertus 70B Instruct: 5.5 (2025-09-02)Apertus 8B Instruct: 5 (2025-09-02)Kimi K2 0905: 6.3 (2025-09-05)Kimi K2-Instruct-0905: 4.7 (2025-09-05)Gemini 2.5 Flash-Lite: 6.6 (2025-09-08)Ling-mini-2.0: 5 (2025-09-09)Qwen3-Next-80B-A3B: 11.7 (2025-09-10)Qwen3 Next 80B A3B Instruct: 7.3 (2025-09-11)Ling-flash-2.0: 6.3 (2025-09-17)Magistral Small 1.2: 6.1 (2025-09-17)Magistral Medium 1.2: 9.6 (2025-09-18)Grok 4 Fast: 20 (2025-09-19)Ring-flash-2.0: 8.9 (2025-09-19)DeepSeek V3.1 Terminus: 15.2 (2025-09-22)Qwen3 Omni 30B A3B: 7.3 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 5.1 (2025-09-22)Granite 4.0 H Small: 3.7 (2025-09-22)GPT-5 Codex: 25.6 (2025-09-23)Qwen3 Max: 11.1 (2025-09-23)Qwen3 VL 235B A22B: 10.1 (2025-09-23)Qwen3 VL 235B A22B Instruct: 6.3 (2025-09-23)LFM2 2.6B: 5.2 (2025-09-23)Gemini 2.5 Flash: 12.7 (2025-09-25)DeepSeek V3.2 Exp: 19.8 (2025-09-29)Claude Sonnet 4.5: 17.3 (2025-09-29)GLM-4.6: 17.2 (2025-09-30)Apriel-v1.5-15B-Thinker: 12 (2025-09-30)Qwen3 VL 30B A3B: 8.7 (2025-10-03)GPT-5 Pro: 42 (2025-10-06)Qwen3 VL 30B A3B Instruct: 6.4 (2025-10-06)LFM2 8B A1B: 4.9 (2025-10-07)Ling-1T: 7.2 (2025-10-08)Jamba Reasoning 3B: 4.6 (2025-10-08)Ring-1T: 10.2 (2025-10-13)Qwen3 VL 4B: 4.4 (2025-10-14)Qwen3 VL 4B Instruct: 3.7 (2025-10-14)Qwen3 VL 8B: 3.3 (2025-10-14)Qwen3 VL 8B Instruct: 2.9 (2025-10-14)Claude Haiku 4.5: 9.7 (2025-10-15)Phi 4 Mini Instruct: 4.2 (2025-10-17)Granite 4.0 Micro: 5.1 (2025-10-20)Qwen3 VL 32B: 9.6 (2025-10-21)Qwen3 VL 32B Instruct: 6.3 (2025-10-23)MiniMax-M2: 12.5 (2025-10-27)Granite 4.0 H 350M: 6.4 (2025-10-28)Granite 4.0 350M: 5.7 (2025-10-28)NVIDIA Nemotron Nano 12B v2 VL: 5.3 (2025-10-28)Granite 4.0 1B: 5.1 (2025-10-28)Granite 4.0 H 1B: 5 (2025-10-28)Kimi Linear 48B A3B Instruct: 2.7 (2025-10-30)Kimi K2 Thinking: 22.3 (2025-11-06)KAT-Coder-Pro V1: 33.4 (2025-11-11)Doubao Seed Code: 13.3 (2025-11-11)GPT-5.1: 26.5 (2025-11-12)GPT-5.1-Codex: 23.4 (2025-11-13)GPT-5.1-Codex-Mini: 16.9 (2025-11-13)ERNIE 5.0 Thinking: 12.7 (2025-11-13)Gemini 3 Deep Think: 41 (2025-11-18)Gemini 3 Pro: 37.5 (2025-11-18)Cogito v2.1: 11 (2025-11-18)Grok 4.1 Fast: 17.6 (2025-11-19)Olmo 3 7B Instruct: 5.8 (2025-11-20)Olmo 3 7B Think: 5.7 (2025-11-20)Olmo 3 32B Think: 5.9 (2025-11-21)Claude Opus 4.5: 28.4 (2025-11-24)Apriel-v1.6-15B-Thinker: 9.8 (2025-11-25)Nova 2.0 Omni: 6.8 (2025-11-26)INTELLECT-3: 12.1 (2025-11-27)Nova 2.0 Pro: 8.9 (2025-11-27)DeepSeek V3.2 Speciale: 26.1 (2025-12-01)DeepSeek-V3.2: 22.2 (2025-12-01)Nova 2 Lite: 10.9 (2025-12-02)Ministral 3 3B: 5.3 (2025-12-02)Ministral 3 14B: 4.6 (2025-12-02)Ministral 3 8B: 4.3 (2025-12-02)Mistral Large 3: 4.1 (2025-12-02)Motif-2-12.7B-Reasoning: 8.2 (2025-12-04)K2-V2: 9.8 (2025-12-05)GLM 4.6V: 8.9 (2025-12-08)Devstral 2: 3.6 (2025-12-09)Devstral Small 2: 3.4 (2025-12-09)GPT-5.2: 35.4 (2025-12-11)Mi:dm K 2.5 Pro: 8.8 (2025-12-11)Molmo2-8B: 4.4 (2025-12-11)Olmo 3.1 32B Think: 6 (2025-12-12)MiMo-V2-Flash: 21.1 (2025-12-14)NVIDIA Nemotron 3 Nano 30B A3B: 10.2 (2025-12-15)K2 Think V2: 9.5 (2025-12-15)Gemini 3 Flash: 34.7 (2025-12-17)Solar Open 100B: 9.2 (2025-12-17)GLM 4.7: 25.1 (2025-12-22)MiniMax M2.1: 22.2 (2025-12-23)HyperCLOVA X SEED Think: 5.5 (2025-12-26)K-EXAONE: 13.1 (2025-12-31)Falcon-H1R-7B: 10.8 (2026-01-04)LFM2.5-1.2B-Instruct: 6.8 (2026-01-05)LFM2.5-VL-1.6B: 5.1 (2026-01-05)Olmo 3.1 32B Instruct: 4.9 (2026-01-13)GPT-5.2-Codex: 33.5 (2026-01-14)GLM 4.7 Flash: 7.1 (2026-01-19)Step3 VL 10B: 10.2 (2026-01-20)LFM2.5-1.2B-Thinking: 6.1 (2026-01-20)Kimi K2.5: 29.4 (2026-01-27)Solar Pro 3: 10.1 (2026-01-27)LongCat Flash Lite: 6 (2026-01-28)Step 3.5 Flash: 19.1 (2026-01-29)Qwen3 Coder Next: 9.3 (2026-02-04)Claude Opus 4.6: 36.7 (2026-02-05)Qwen3 Max Thinking: 26.2 (2026-02-09)Tri-21B-Think: 6.1 (2026-02-10)GLM-5: 27.2 (2026-02-11)Nanbeige4.1-3B: 10 (2026-02-11)MiniMax M2.5: 19.1 (2026-02-12)Qwen3.5 397B A17B: 27.3 (2026-02-16)Claude Sonnet 4.6: 30 (2026-02-17)Tiny Aya Global: 5.2 (2026-02-17)Gemini 3.1 Pro: 44.4 (2026-02-19)GPT-5.3-Codex: 39.9 (2026-02-24)Qwen3.5-122B-A10B: 23.4 (2026-02-25)Qwen3.5-27B: 22.2 (2026-02-25)Qwen3.5-35B-A3B: 19.7 (2026-02-25)LFM2-24B-A2B: 4.4 (2026-02-25)Qwen3.5 4B: 7.8 (2026-03-02)Qwen3.5 2B: 4.9 (2026-03-02)Qwen3.5 0.8B: 4.9 (2026-03-02)Mercury 2: 15.5 (2026-03-04)GPT-5.4: 41.6 (2026-03-05)Sarvam 105B: 10.1 (2026-03-06)Sarvam 30B: 7 (2026-03-06)Grok 4.20 0309: 30 (2026-03-10)Qwen3.5-9B: 13.3 (2026-03-10)NVIDIA Nemotron 3 Super 120B A12B: 19.2 (2026-03-11)GLM 5 Turbo: 25.4 (2026-03-15)Mistral Small 4: 9.5 (2026-03-16)NVIDIA Nemotron 3 Nano 4B: 4.8 (2026-03-16)GPT-5.4 mini: 26.6 (2026-03-17)GPT-5.4 nano: 26.5 (2026-03-17)MiMo-V2-Pro: 28.3 (2026-03-18)MiniMax M2.7: 28.1 (2026-03-18)MiMo-V2-Omni: 19.9 (2026-03-18)Nemotron Cascade 2 30B A3B: 11.4 (2026-03-19)MiMo-V2-Omni-0327: 20.4 (2026-03-27)KAT-Coder-Pro V2: 16 (2026-03-27)Qwen3.5 Omni Plus: 13.9 (2026-03-30)Qwen3.5 Omni Flash: 7.1 (2026-03-30)GLM 5V Turbo: 15.8 (2026-04-01)Trinity Large Thinking: 14.7 (2026-04-01)Qwen3.6 Plus: 25.7 (2026-04-02)Gemma 4 31B: 22.7 (2026-04-02)Step 3.5 Flash 2603: 22.6 (2026-04-02)Gemma 4 E2B: 4.8 (2026-04-02)Gemma 4 26B A4B: 18.3 (2026-04-03)Gemma 4 E4B: 4.7 (2026-04-03)Grok 4.20 0309 v2: 32.2 (2026-04-07)GLM 5.1: 28 (2026-04-07)Muse Spark: 39.9 (2026-04-08)EXAONE 4.5 33B: 11.6 (2026-04-09)JT-MINI: 6.6 (2026-04-15)Claude Opus 4.7: 39.6 (2026-04-16)Kimi K2.6: 35.9 (2026-04-20)Ling-2.6-flash: 6.2 (2026-04-21)MiMo-V2.5-Pro: 33.8 (2026-04-22)Hy3: 25.5 (2026-04-22)MiMo-V2.5: 25.2 (2026-04-22)GPT-5.5: 44.3 (2026-04-23)Ling-2.6-1T: 8.2 (2026-04-23)DeepSeek-V4-Pro: 35.9 (2026-04-24)DeepSeek-V4-Flash: 32.1 (2026-04-24)Qwen3.6 Max: 28.9 (2026-04-27)Qwen3.6 27B: 21.6 (2026-04-27)Qwen3.6 35B A3B: 20.2 (2026-04-27)Nemotron 3 Nano Omni 30B A3B Reasoning: 5.3 (2026-04-29)Granite 4.1 30B: 4.2 (2026-04-29)Granite 4.1 3B: 3.4 (2026-04-29)Mistral Medium 3.5: 12.8 (2026-04-30)Granite 4.1 8B: 3.8 (2026-04-30)Grok 4.3: 35 (2026-05-06)Gemini 3.1 Flash Lite: 16.2 (2026-05-07)Ring-2.6-1T: 18.3 (2026-05-08)MiniCPM-V 4.6 1.3B: 4.9 (2026-05-11)JT-35B-Flash: 6.1 (2026-05-14)Gemini 3.5 Flash: 41 (2026-05-19)Qwen3.7 Max: 38.1 (2026-05-21)MiniCPM5-1B: 4.6 (2026-05-25)Claude Instant: 3.8 (2023-03-14)Claude InstantLlama 2 Chat 7B: 5.8 (2023-07-18)Llama 2 Chat 7BDBRX Instruct: 6.6 (2024-03-27)DBRX Instructo1: 7.7 (2024-12-05)o1DeepSeek-R1: 9.3 (2025-01-20)DeepSeek-R1o3-mini: 12.3 (2025-01-31)o3-miniGemini 2.5 Pro: 17.8 (2025-03-25)Gemini 2.5 Proo3: 24.3 (2025-04-16)o3Grok-4 Heavy: 50.7 (2025-07-09)Grok-4 Heavy

Ranking

1Grok-4 Heavy
50.7
2Gemini 3.1 Pro
44.4
3GPT-5.5
44.3
4GPT-5 Pro
42
5GPT-5.4
41.6
6Gemini 3.5 Flash
41
7Gemini 3 Deep Think
41
8Grok 4
40
9Muse Spark
39.9
10GPT-5.3-Codex
39.9
11Claude Opus 4.7
39.6
12Qwen3.7 Max
38.1
13Gemini 3 Pro
37.5
14Claude Opus 4.6
36.7
15Kimi K2.6
35.9
16DeepSeek-V4-Pro
35.9
17GPT-5.2
35.4
18Grok 4.3
35
19Gemini 3 Flash
34.7
20MiMo-V2.5-Pro
33.8
21GPT-5.2-Codex
33.5
22KAT-Coder-Pro V1
33.4
23Grok 4.20 0309 v2
32.2
24DeepSeek-V4-Flash
32.1
25Grok 4.20 0309
30
26Claude Sonnet 4.6
30
27Kimi K2.5
29.4
28Qwen3.6 Max
28.9
29Claude Opus 4.5
28.4
30MiMo-V2-Pro
28.3
31MiniMax M2.7
28.1
32GLM 5.1
28
33Qwen3.5 397B A17B
27.3
34GLM-5
27.2
35GPT-5.4 mini
26.6
36GPT-5.4 nano
26.5
37GPT-5.1
26.5
38Qwen3 Max Thinking
26.2
39DeepSeek V3.2 Speciale
26.1
40Qwen3.6 Plus
25.7
41GPT-5 Codex
25.6
42Hy3
25.5
43GLM 5 Turbo
25.4
44MiMo-V2.5
25.2
45GLM 4.7
25.1
46GPT-5
24.8
47o3
24.3
48GPT-5.1-Codex
23.4
49Qwen3.5-122B-A10B
23.4
50Gemma 4 31B
22.7
51Step 3.5 Flash 2603
22.6
52Kimi K2 Thinking
22.3
53MiniMax M2.1
22.2
54Qwen3.5-27B
22.2
55DeepSeek-V3.2
22.2
56Gemini 2.5 Pro Preview 06-05
21.6
57Qwen3.6 27B
21.6
58MiMo-V2-Flash
21.1
59MiMo-V2-Omni-0327
20.4
60Qwen3.6 35B A3B
20.2
61Grok 4 Fast
20
62MiMo-V2-Omni
19.9
63DeepSeek V3.2 Exp
19.8
64Qwen3.5-35B-A3B
19.7
65NVIDIA Nemotron 3 Super 120B A12B
19.2
66Step 3.5 Flash
19.1
67MiniMax M2.5
19.1
68gpt-oss-120b
19
69Gemma 4 26B A4B
18.3
70Ring-2.6-1T
18.3
71Qwen3-235B-A22B-Thinking-2507
18.2
72Gemini 2.5 Pro
17.8
73DeepSeek-R1-0528
17.7
74Grok 4.1 Fast
17.6
75gpt-oss-20b
17.3
76Claude Sonnet 4.5
17.3
77GLM-4.6
17.2
78GPT-5.1-Codex-Mini
16.9
79GPT-5 mini
16.7
80Gemini 3.1 Flash Lite
16.2
81KAT-Coder-Pro V2
16
82DeepSeek-V3.1
15.9
83GLM 5V Turbo
15.8
84Mercury 2
15.5
85DeepSeek V3.1 Terminus
15.2
86Qwen3 235B A22B 2507
15
87Trinity Large Thinking
14.7
88o4-mini
14.7
89GLM-4.5
14.4
90Qwen3.5 Omni Plus
13.9
91Doubao Seed Code
13.3
92Qwen3.5-9B
13.3
93K-EXAONE
13.1
94Mistral Medium 3.5
12.8
95Gemini 2.5 Flash
12.7
96ERNIE 5.0 Thinking
12.7
97MiniMax-M2
12.5
98o3-mini
12.3
99INTELLECT-3
12.1
100Apriel-v1.5-15B-Thinker
12
101Claude Opus 4.1
11.9
102Qwen3 235B A22B
11.7
103Qwen3-Next-80B-A3B
11.7
104Claude Opus 4
11.7
105EXAONE 4.5 33B
11.6
106Nemotron Cascade 2 30B A3B
11.4
107Command A
11.4
108Grok 3 mini Reasoning
11.1
109Qwen3 Max
11.1
110Cogito v2.1
11
111Gemini 2.5 Flash
11
112Nova 2 Lite
10.9
113Falcon-H1R-7B
10.8
114Qwen3-235B-A22B-Instruct-2507
10.6
115GLM 4.5 Air
10.6
116EXAONE 4.0 32B
10.5
117Hermes 4 - Llama-3.1 405B
10.3
118Claude 3.7 Sonnet
10.3
119Ring-1T
10.2
120Step3 VL 10B
10.2
121NVIDIA Nemotron 3 Nano 30B A3B
10.2
122Qwen3 VL 235B A22B
10.1
123Sarvam 105B
10.1
124Solar Pro 3
10.1
125Nanbeige4.1-3B
10
126Qwen3 30B A3B 2507
9.8
127Apriel-v1.6-15B-Thinker
9.8
128K2-V2
9.8
129Claude Haiku 4.5
9.7
130Qwen3 VL 32B
9.6
131Magistral Medium 1.2
9.6
132Claude Sonnet 4
9.6
133Magistral Medium 1
9.5
134K2 Think V2
9.5
135Mistral Small 4
9.5
136Qwen3 Coder Next
9.3
137DeepSeek-R1
9.3
138Solar Open 100B
9.2
139Seed-OSS-36B-Instruct
9.1
140Magistral Medium
9
141Ring-flash-2.0
8.9
142Nova 2.0 Pro
8.9
143GLM 4.6V
8.9
144Mi:dm K 2.5 Pro
8.8
145Qwen3 VL 30B A3B
8.7
146GPT-5 nano
8.7
147Qwen3 32B
8.3
148MiniMax M1 80k
8.2
149Motif-2-12.7B-Reasoning
8.2
150QwQ-32B
8.2
151Ling-2.6-1T
8.2
152Llama 3.1 Nemotron Ultra 253B v1
8.1
153Hermes 4 - Llama-3.1 70B
7.9
154Sonar Pro
7.9
155Qwen3.5 4B
7.8
156o1
7.7
157MiniMax M1 40k
7.5
158Grok Code Fast 1
7.5
159Qwen3 Omni 30B A3B
7.3
160Sonar
7.3
161Qwen3 Next 80B A3B Instruct
7.3
162Magistral Small 1
7.2
163Ling-1T
7.2
164Qwen3.5 Omni Flash
7.1
165Gemini 2.0 Flash Thinking
7.1
166GLM 4.7 Flash
7.1
167Sarvam 30B
7
168Solar Pro 2
7
169Kimi K2
7
170Qwen3 30B A3B 2507 Instruct
6.8
171Gemini 2.0 Pro
6.8
172Llama Nemotron Super 49B v1.5
6.8
173LFM2.5-1.2B-Instruct
6.8
174Nova 2.0 Omni
6.8
175Gemini 2.5 Flash-Lite
6.6
176DBRX Instruct
6.6
177JT-MINI
6.6
178Qwen3 30B A3B
6.6
179Llama-3.3 Nemotron Super 49B v1
6.5
180Granite 4.0 H 350M
6.4
181Qwen3 VL 30B A3B Instruct
6.4
182Ling-flash-2.0
6.3
183Kimi K2 0905
6.3
184Qwen3 VL 235B A22B Instruct
6.3
185Qwen3 VL 32B Instruct
6.3
186Ling-2.6-flash
6.2
187JT-35B-Flash
6.1
188Tri-21B-Think
6.1
189LFM2.5-1.2B-Thinking
6.1
190Magistral Small 1.2
6.1
191DeepSeek R1 Distill Llama 70B
6.1
192LongCat Flash Lite
6
193Olmo 3.1 32B Think
6
194Qwen3 4B 2507
5.9
195GLM 4.5V
5.9
196Olmo 3 32B Think
5.9
197Llama 2 Chat 7B
5.8
198Exaone 4.0 1.2B
5.8
199Olmo 3 7B Instruct
5.8
200Qwen3 0.6B
5.7
201LFM2 1.2B
5.7
202Granite 4.0 350M
5.7
203Olmo 3 7B Think
5.7
204DeepSeek R1 0528 Qwen3 8B
5.6
205OLMo 2 7B
5.5
206Apertus 70B Instruct
5.5
207HyperCLOVA X SEED Think
5.5
208DeepSeek R1 Distill Qwen 32B
5.5
209GPT-4.1
5.4
210DeepSeek Coder V2 Lite Instruct
5.3
211Nemotron 3 Nano Omni 30B A3B Reasoning
5.3
212NVIDIA Nemotron Nano 12B v2 VL
5.3
213Llama 3.2 1B Instruct
5.3
214Ministral 3 3B
5.3
215Gemini 2.0 Flash
5.3
216GPT-4o
5.3
217Qwen3 1.7B
5.2
218Gemma 3 1B Instruct
5.2
219Gemma 3 4B Instruct
5.2
220Tiny Aya Global
5.2
221LFM2 2.6B
5.2
222Llama 3.2 11B Instruct
5.2
223DeepSeek-V3 0324
5.2
224Llama 3.2 3B Instruct
5.2
225Qwen3 4B
5.1
226Qwen3 Omni 30B A3B Instruct
5.1
227Granite 4.0 1B
5.1
228Molmo 7B-D
5.1
229Llama 3.1 Nemotron Nano 4B v1.1
5.1
230LFM2.5-VL-1.6B
5.1
231Reka Flash 3
5.1
232Jamba 1.5 Mini
5.1
233Llama 3 8B Instruct
5.1
234Llama 3.1 8B Instruct
5.1
235Gemini 2.5 Flash Lite
5.1
236Granite 4.0 Micro
5.1
237Grok-3
5.1
238Llama 2 Chat 70B
5
239Ling-mini-2.0
5
240Apertus 8B Instruct
5
241Granite 4.0 H 1B
5
242LFM 40B
4.9
243Gemma 3n E4B Instruct
4.9
244Qwen3.5 2B
4.9
245Qwen3.5 0.8B
4.9
246MiniCPM-V 4.6 1.3B
4.9
247Olmo 3.1 32B Instruct
4.9
248LFM2 8B A1B
4.9
249o1-mini
4.9
250Llama 3.2 90B Instruct
4.9
251Gemini 1.5 Pro
4.9
252OpenChat 3.5
4.8
253Mistral Small 3.1
4.8
254Gemma 3 12B Instruct
4.8
255NVIDIA Nemotron 3 Nano 4B
4.8
256Gemma 4 E2B
4.8
257QwQ-32B-Preview
4.8
258Qwen2.5-Coder 7B Instruct
4.8
259Command R+
4.8
260Llama 4 Maverick
4.8
261Qwen3 4B 2507 Instruct
4.7
262Grok
4.7
263Gemma 3 27B Instruct
4.7
264Llama 2 Chat 13B
4.7
265Nova Premier
4.7
266Gemma 4 E4B
4.7
267Nova Micro
4.7
268Kimi K2-Instruct-0905
4.7
269Kimi K2 Instruct
4.7
270MiniCPM5-1B
4.6
271Jamba 1.6 Mini
4.6
272Jamba Reasoning 3B
4.6
273NVIDIA Nemotron Nano 9B V2
4.6
274Nova Lite
4.6
275Llama 3.1 Nemotron 70B Instruct
4.6
276Gemini 1.0 Pro
4.6
277Llama 3.1 70B Instruct
4.6
278Ministral 3 14B
4.6
279Qwen2.5 Max
4.5
280Mixtral 8x7B Instruct
4.5
281Jamba 1.7 Mini
4.5
282Gemini 1.5 Flash 8B
4.5
283Qwen3 VL 4B
4.4
284Qwen3 Coder 480B A35B Instruct
4.4
285Phi-3 Mini Instruct 3.8B
4.4
286Mistral Small
4.4
287Molmo2-8B
4.4
288Phi-4-multimodal-instruct
4.4
289DeepSeek R1 Distill Qwen 14B
4.4
290Llama 3 70B Instruct
4.4
291Gemini 2.0 Flash Lite
4.4
292Mistral Medium 3.1
4.4
293LFM2-24B-A2B
4.4
294Mistral Small 3.2
4.3
295Mistral 7B Instruct
4.3
296DeepHermes 3 - Llama-3.1 8B
4.3
297Qwen3 14B
4.3
298Mistral Medium 3
4.3
299Ministral 3 8B
4.3
300Llama 4 Scout
4.3
301Qwen2.5 Turbo
4.2
302Granite 3.3 8B
4.2
303Claude 2.1
4.2
304Granite 4.1 30B
4.2
305Gemma 3 270M
4.2
306Llama 3.1 405B Instruct
4.2
307Gemini 1.5 Flash
4.2
308DeepSeek R1 Distill Llama 8B
4.2
309Qwen2.5 72B Instruct
4.2
310Qwen3 8B
4.2
311Phi 4 Mini Instruct
4.2
312Hermes 3 - Llama-3.1 70B
4.1
313Mistral Saba
4.1
314Mixtral 8x22B Instruct
4.1
315Phi 4
4.1
316Mistral Small 3
4.1
317Mistral Large 3
4.1
318Jamba 1.6 Large
4
319Devstral Small
4
320Gemma 3n E2B Instruct
4
321Jamba 1.5 Large
4
322GPT-4o-mini
4
323Llama 3.3 70B Instruct
4
324Qwen3 Coder 30B A3B Instruct
4
325Mistral Large 2
4
326DeepHermes 3 - Mistral 24B
3.9
327Claude 3 Haiku
3.9
328GPT-4.1 Nano
3.9
329Claude 3.5 Sonnet
3.9
330Claude Instant
3.8
331Devstral Medium
3.8
332Qwen2.5 32B Instruct
3.8
333Claude 3 Sonnet
3.8
334Qwen2.5 Coder 32B Instruct
3.8
335Jamba Large 1.7
3.8
336Granite 4.1 8B
3.8
337Grok-2
3.8
338Qwen3 VL 4B Instruct
3.7
339OLMo 2 32B
3.7
340Granite 4.0 H Small
3.7
341Qwen2 72B Instruct
3.7
342GPT-4.1 Mini
3.7
343Pixtral Large
3.6
344Devstral 2
3.6
345DeepSeek-V3
3.6
346Llama 3.1 Tulu3 405B
3.5
347ERNIE 4.5 300B A47B
3.5
348Claude 3.5 Haiku
3.5
349Mistral Medium
3.4
350Granite 4.1 3B
3.4
351Devstral Small 2
3.4
352Nova Pro
3.4
353Mistral Large
3.4
354Qwen3 VL 8B
3.3
355Sarvam M
3.3
356DeepSeek R1 Distill Qwen 1.5B
3.3
357GPT-4 Turbo
3.3
358Claude 3 Opus
3.1
359Qwen3 VL 8B Instruct
2.9
360Kimi Linear 48B A3B Instruct
2.7

Related General benchmarks