AI War Tracker
Coding

SciCode

Scientific-coding benchmark — graduate-level coding problems drawn from physics, biology, math, and CS research.

348Models
58.9Top score
29.9Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

7053351802023202420252026Llama 2 Chat 13B: 11.8 (2023-07-18)Llama 2 Chat 7B: 0 (2023-07-18)Mistral 7B Instruct: 2.4 (2023-09-27)Claude 2.1: 18.4 (2023-11-21)Mistral Medium: 11.8 (2023-12-11)Mixtral 8x7B Instruct: 2.8 (2023-12-11)Gemini 1.5 Pro: 29.5 (2024-02-15)Gemini 1.0 Pro: 11.7 (2024-02-15)Mistral Small: 15.6 (2024-02-26)Claude 3 Sonnet: 22.9 (2024-02-29)Claude 3 Opus: 23.3 (2024-03-04)Claude 3 Haiku: 18.6 (2024-03-13)Gemini 1.5 Flash 8B: 22.9 (2024-03-15)DBRX Instruct: 11.8 (2024-03-27)Command R+: 11.8 (2024-04-04)Mixtral 8x22B Instruct: 18.8 (2024-04-17)Llama 3 70B Instruct: 18.9 (2024-04-18)Llama 3 8B Instruct: 11.9 (2024-04-18)Phi-3 Mini Instruct 3.8B: 9 (2024-04-23)Gemini 1.5 Flash: 26.7 (2024-05-01)DeepSeek Coder V2 Lite Instruct: 13.9 (2024-06-17)Claude 3.5 Sonnet: 36.6 (2024-06-20)GPT-4o-mini: 22.9 (2024-07-18)Llama 3.1 405B Instruct: 29.9 (2024-07-23)Llama 3.1 70B Instruct: 26.7 (2024-07-23)Qwen2 72B Instruct: 22.9 (2024-07-23)Llama 3.1 8B Instruct: 13.2 (2024-07-23)Mistral Large 2: 29.2 (2024-07-24)Grok: 29.5 (2024-08-13)Grok-2: 28.5 (2024-08-13)Hermes 3 - Llama-3.1 70B: 23.1 (2024-08-15)Jamba 1.5 Large: 16.3 (2024-08-22)Jamba 1.5 Mini: 8 (2024-08-22)o1-mini: 32.3 (2024-09-12)Qwen2.5 72B Instruct: 26.7 (2024-09-19)Qwen2.5 32B Instruct: 22.9 (2024-09-19)Qwen2.5-Coder 7B Instruct: 14.8 (2024-09-19)Llama 3.2 90B Instruct: 24 (2024-09-25)Llama 3.2 11B Instruct: 11.2 (2024-09-25)Llama 3.2 3B Instruct: 5.2 (2024-09-25)Molmo 7B-D: 3.6 (2024-09-25)Llama 3.2 1B Instruct: 1.7 (2024-09-25)LFM 40B: 7.1 (2024-09-30)Llama 3.1 Nemotron 70B Instruct: 23.3 (2024-10-01)Claude 3.5 Haiku: 27.4 (2024-11-04)Qwen2.5 Coder 32B Instruct: 27.1 (2024-11-11)Qwen2.5 Turbo: 15.3 (2024-11-18)Pixtral Large: 29.2 (2024-11-19)Mistral Large: 20.8 (2024-11-19)Nova Pro: 20.8 (2024-11-20)Nova Lite: 13.9 (2024-11-20)Nova Micro: 9.4 (2024-11-20)OLMo 2 7B: 3.7 (2024-11-26)o1: 35.8 (2024-12-05)Llama 3.3 70B Instruct: 26 (2024-12-06)Gemini 2.0 Flash: 34 (2024-12-11)DeepSeek-V3: 35.4 (2024-12-26)Phi 4: 26 (2025-01-10)DeepSeek-R1: 35.7 (2025-01-20)DeepSeek R1 Distill Llama 70B: 31.3 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 23.9 (2025-01-20)DeepSeek R1 Distill Llama 8B: 11.9 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 6.6 (2025-01-20)Gemini 2.0 Flash Thinking: 32.9 (2025-01-21)Sonar: 22.9 (2025-01-27)Qwen2.5 Max: 33.7 (2025-01-28)Llama 3.1 Tulu3 405B: 30.2 (2025-01-30)Mistral Small 3: 23.6 (2025-01-30)Phi-4-multimodal-instruct: 11 (2025-02-01)Gemini 2.0 Pro: 31.3 (2025-02-05)DeepHermes 3 - Llama-3.1 8B: 9.1 (2025-02-13)Grok-3: 36.8 (2025-02-17)Mistral Saba: 24.1 (2025-02-17)Claude 3.7 Sonnet: 40.3 (2025-02-24)Gemini 2.0 Flash Lite: 25 (2025-02-25)QwQ-32B: 35.8 (2025-03-05)Jamba 1.6 Large: 18.4 (2025-03-06)Jamba 1.6 Mini: 10.1 (2025-03-06)Sonar Pro: 22.6 (2025-03-07)Reka Flash 3: 26.7 (2025-03-12)Gemma 3 27B Instruct: 21.2 (2025-03-12)Gemma 3 12B Instruct: 17.4 (2025-03-12)Gemma 3 4B Instruct: 7.3 (2025-03-12)Command A: 37.8 (2025-03-13)DeepHermes 3 - Mistral 24B: 22.8 (2025-03-13)OLMo 2 32B: 8 (2025-03-13)Gemma 3 1B Instruct: 0.7 (2025-03-13)Mistral Small 3.1: 26.5 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 28.2 (2025-03-18)DeepSeek-V3 0324: 35.8 (2025-03-25)Llama 4 Maverick: 33.1 (2025-04-05)Llama 4 Scout: 17 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 34.7 (2025-04-07)GPT-4.1 Mini: 40.4 (2025-04-14)GPT-4.1: 38.1 (2025-04-14)GPT-4.1 Nano: 25.9 (2025-04-14)o3: 41 (2025-04-16)Granite 3.3 8B: 10.1 (2025-04-16)Gemini 2.5 Flash: 39.4 (2025-04-17)Qwen3 235B A22B: 39.9 (2025-04-28)Qwen3 32B: 35.4 (2025-04-28)Qwen3 14B: 31.6 (2025-04-28)Qwen3 30B A3B: 28.5 (2025-04-28)Qwen3 8B: 22.6 (2025-04-28)Qwen3 4B: 16.7 (2025-04-28)Qwen3 1.7B: 6.9 (2025-04-28)Qwen3 0.6B: 4.1 (2025-04-28)Nova Premier: 27.9 (2025-04-30)Mistral Medium 3: 33.1 (2025-05-07)Solar Pro 2: 30.2 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 10.1 (2025-05-20)Gemma 3n E4B Instruct: 8.6 (2025-05-20)Devstral Small: 24.5 (2025-05-21)Claude Opus 4: 40.9 (2025-05-22)Claude Sonnet 4: 40 (2025-05-22)Sarvam M: 17.8 (2025-05-23)DeepSeek-R1-0528: 40.3 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 20.4 (2025-05-29)Magistral Medium 1: 29.7 (2025-06-10)Magistral Small 1: 24.1 (2025-06-10)MiniMax M1 40k: 37.8 (2025-06-17)MiniMax M1 80k: 37.4 (2025-06-17)Mistral Small 3.2: 26.4 (2025-06-20)Gemma 3n E2B Instruct: 5.2 (2025-06-26)ERNIE 4.5 300B A47B: 31.5 (2025-06-30)Jamba 1.7 Mini: 9.3 (2025-07-07)Grok 4: 45.7 (2025-07-09)Devstral Medium: 29.4 (2025-07-10)LFM2 1.2B: 2.5 (2025-07-10)Kimi K2: 34.5 (2025-07-11)EXAONE 4.0 32B: 34.4 (2025-07-15)Exaone 4.0 1.2B: 9.3 (2025-07-15)Qwen3-235B-A22B-Instruct-2507: 36 (2025-07-22)Qwen3 Coder 480B A35B Instruct: 35.9 (2025-07-22)Gemini 2.5 Flash Lite: 19.3 (2025-07-22)Qwen3 235B A22B 2507: 42.4 (2025-07-25)Llama Nemotron Super 49B v1.5: 34.8 (2025-07-25)GLM 4.5 Air: 30.6 (2025-07-25)GLM-4.5: 34.8 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 30.4 (2025-07-29)Qwen3 30B A3B 2507: 33.3 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 27.8 (2025-07-31)Claude Opus 4.1: 40.9 (2025-08-05)gpt-oss-120b: 38.9 (2025-08-05)gpt-oss-20b: 34.4 (2025-08-05)Qwen3 4B 2507: 25.6 (2025-08-06)Qwen3 4B 2507 Instruct: 18.1 (2025-08-06)GPT-5: 42.9 (2025-08-07)GPT-5 mini: 41 (2025-08-07)GPT-5 nano: 36.6 (2025-08-07)Jamba Large 1.7: 18.8 (2025-08-08)GLM 4.5V: 22.1 (2025-08-11)Mistral Medium 3.1: 33.8 (2025-08-13)Gemma 3 270M: 0 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 22 (2025-08-18)Seed-OSS-36B-Instruct: 36.5 (2025-08-20)DeepSeek-V3.1: 39.1 (2025-08-21)Hermes 4 - Llama-3.1 405B: 34.6 (2025-08-27)Hermes 4 - Llama-3.1 70B: 34.1 (2025-08-27)Grok Code Fast 1: 36.2 (2025-08-28)Apertus 70B Instruct: 5.7 (2025-09-02)Apertus 8B Instruct: 4.1 (2025-09-02)Kimi K2 0905: 30.7 (2025-09-05)Ling-mini-2.0: 13.5 (2025-09-09)Qwen3-Next-80B-A3B: 38.8 (2025-09-10)Qwen3 Next 80B A3B Instruct: 30.7 (2025-09-11)Magistral Small 1.2: 35.2 (2025-09-17)Ling-flash-2.0: 28.9 (2025-09-17)Magistral Medium 1.2: 39.2 (2025-09-18)Grok 4 Fast: 44.2 (2025-09-19)Ring-flash-2.0: 16.8 (2025-09-19)DeepSeek V3.1 Terminus: 40.6 (2025-09-22)Qwen3 Omni 30B A3B: 30.6 (2025-09-22)Granite 4.0 H Small: 20.9 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 18.6 (2025-09-22)GPT-5 Codex: 40.9 (2025-09-23)Qwen3 VL 235B A22B: 39.9 (2025-09-23)Qwen3 Max: 38.3 (2025-09-23)Qwen3 VL 235B A22B Instruct: 35.9 (2025-09-23)LFM2 2.6B: 2.5 (2025-09-23)Claude Sonnet 4.5: 44.7 (2025-09-29)DeepSeek V3.2 Exp: 39.9 (2025-09-29)GLM-4.6: 38.4 (2025-09-30)Apriel-v1.5-15B-Thinker: 34.8 (2025-09-30)Qwen3 VL 30B A3B: 28.8 (2025-10-03)Qwen3 VL 30B A3B Instruct: 30.8 (2025-10-06)LFM2 8B A1B: 6.8 (2025-10-07)Ling-1T: 35.2 (2025-10-08)Jamba Reasoning 3B: 5.9 (2025-10-08)Ring-1T: 36.7 (2025-10-13)Qwen3 VL 8B: 21.9 (2025-10-14)Qwen3 VL 8B Instruct: 17.4 (2025-10-14)Qwen3 VL 4B: 17.1 (2025-10-14)Qwen3 VL 4B Instruct: 13.7 (2025-10-14)Claude Haiku 4.5: 43.3 (2025-10-15)Phi 4 Mini Instruct: 10.8 (2025-10-17)Granite 4.0 Micro: 11.9 (2025-10-20)Qwen3 VL 32B: 28.5 (2025-10-21)Qwen3 VL 32B Instruct: 30.1 (2025-10-23)MiniMax-M2: 36.1 (2025-10-27)NVIDIA Nemotron Nano 12B v2 VL: 26.2 (2025-10-28)Granite 4.0 1B: 8.7 (2025-10-28)Granite 4.0 H 1B: 8.2 (2025-10-28)Granite 4.0 H 350M: 1.7 (2025-10-28)Granite 4.0 350M: 0.9 (2025-10-28)Kimi Linear 48B A3B Instruct: 19.9 (2025-10-30)Kimi K2 Thinking: 42.4 (2025-11-06)Doubao Seed Code: 40.7 (2025-11-11)KAT-Coder-Pro V1: 36.6 (2025-11-11)GPT-5.1: 43.3 (2025-11-12)GPT-5.1-Codex-Mini: 42.6 (2025-11-13)GPT-5.1-Codex: 40.2 (2025-11-13)ERNIE 5.0 Thinking: 37.5 (2025-11-13)Cogito v2.1: 41 (2025-11-18)Grok 4.1 Fast: 44.2 (2025-11-19)Olmo 3 7B Think: 21.2 (2025-11-20)Olmo 3 7B Instruct: 10.3 (2025-11-20)Olmo 3 32B Think: 28.6 (2025-11-21)Claude Opus 4.5: 49.5 (2025-11-24)Apriel-v1.6-15B-Thinker: 37.3 (2025-11-25)Nova 2.0 Omni: 36.2 (2025-11-26)Nova 2.0 Pro: 42.7 (2025-11-27)INTELLECT-3: 39.1 (2025-11-27)DeepSeek V3.2 Speciale: 44 (2025-12-01)DeepSeek-V3.2: 38.9 (2025-12-01)Nova 2 Lite: 36.9 (2025-12-02)Mistral Large 3: 36.2 (2025-12-02)Ministral 3 14B: 23.6 (2025-12-02)Ministral 3 8B: 20.8 (2025-12-02)Ministral 3 3B: 14.4 (2025-12-02)Motif-2-12.7B-Reasoning: 28.2 (2025-12-04)K2-V2: 28.6 (2025-12-05)GLM 4.6V: 30.4 (2025-12-08)Devstral 2: 33.1 (2025-12-09)Devstral Small 2: 28.8 (2025-12-09)GPT-5.2: 52.1 (2025-12-11)Mi:dm K 2.5 Pro: 33.2 (2025-12-11)Molmo2-8B: 13.3 (2025-12-11)Olmo 3.1 32B Think: 29.3 (2025-12-12)MiMo-V2-Flash: 39.4 (2025-12-14)K2 Think V2: 33 (2025-12-15)NVIDIA Nemotron 3 Nano 30B A3B: 29.6 (2025-12-15)Gemini 3 Flash: 50.6 (2025-12-17)Solar Open 100B: 26.9 (2025-12-17)GLM 4.7: 45.1 (2025-12-22)MiniMax M2.1: 40.7 (2025-12-23)HyperCLOVA X SEED Think: 28.4 (2025-12-26)K-EXAONE: 35.6 (2025-12-31)Falcon-H1R-7B: 24.9 (2026-01-04)LFM2.5-VL-1.6B: 3 (2026-01-05)LFM2.5-1.2B-Instruct: 2.3 (2026-01-05)Olmo 3.1 32B Instruct: 16.7 (2026-01-13)GPT-5.2-Codex: 54.6 (2026-01-14)GLM 4.7 Flash: 33.7 (2026-01-19)Step3 VL 10B: 31.1 (2026-01-20)LFM2.5-1.2B-Thinking: 4.2 (2026-01-20)Kimi K2.5: 49 (2026-01-27)Solar Pro 3: 24.7 (2026-01-27)LongCat Flash Lite: 28.4 (2026-01-28)Step 3.5 Flash: 40.4 (2026-01-29)Qwen3 Coder Next: 32.3 (2026-02-04)Claude Opus 4.6: 51.9 (2026-02-05)Qwen3 Max Thinking: 43.1 (2026-02-09)Tri-21B-Think: 17.8 (2026-02-10)GLM-5: 46.2 (2026-02-11)Nanbeige4.1-3B: 26.6 (2026-02-11)MiniMax M2.5: 42.6 (2026-02-12)Qwen3.5 397B A17B: 42 (2026-02-16)Claude Sonnet 4.6: 46.9 (2026-02-17)Tiny Aya Global: 3.6 (2026-02-17)GPT-5.3-Codex: 53.2 (2026-02-24)Qwen3.5-122B-A10B: 42 (2026-02-25)Qwen3.5-27B: 39.5 (2026-02-25)Qwen3.5-35B-A3B: 37.7 (2026-02-25)LFM2-24B-A2B: 10.9 (2026-02-25)Qwen3.5 4B: 18.3 (2026-03-02)Qwen3.5 2B: 7.2 (2026-03-02)Qwen3.5 0.8B: 2.9 (2026-03-02)Mercury 2: 38.7 (2026-03-04)GPT-5.4: 56.6 (2026-03-05)Sarvam 105B: 26.4 (2026-03-06)Sarvam 30B: 19.2 (2026-03-06)Grok 4.20 0309: 44.7 (2026-03-10)Qwen3.5-9B: 27.7 (2026-03-10)NVIDIA Nemotron 3 Super 120B A12B: 36 (2026-03-11)GLM 5 Turbo: 43.6 (2026-03-15)Mistral Small 4: 38 (2026-03-16)NVIDIA Nemotron 3 Nano 4B: 16.4 (2026-03-16)GPT-5.4 mini: 49.9 (2026-03-17)GPT-5.4 nano: 46.9 (2026-03-17)MiniMax M2.7: 47 (2026-03-18)MiMo-V2-Pro: 42.5 (2026-03-18)MiMo-V2-Omni: 36.7 (2026-03-18)Nemotron Cascade 2 30B A3B: 34.8 (2026-03-19)MiMo-V2-Omni-0327: 39.5 (2026-03-27)KAT-Coder-Pro V2: 38.3 (2026-03-27)Qwen3.5 Omni Plus: 40.5 (2026-03-30)Qwen3.5 Omni Flash: 25.5 (2026-03-30)GLM 5V Turbo: 43.5 (2026-04-01)Trinity Large Thinking: 36.1 (2026-04-01)Gemma 4 31B: 43.4 (2026-04-02)Qwen3.6 Plus: 40.7 (2026-04-02)Step 3.5 Flash 2603: 38.5 (2026-04-02)Gemma 4 E2B: 20.9 (2026-04-02)Gemma 4 26B A4B: 40 (2026-04-03)Gemma 4 E4B: 24.4 (2026-04-03)Grok 4.20 0309 v2: 45.6 (2026-04-07)GLM 5.1: 43.8 (2026-04-07)Muse Spark: 51.5 (2026-04-08)EXAONE 4.5 33B: 28 (2026-04-09)JT-MINI: 27.2 (2026-04-15)Claude Opus 4.7: 54.5 (2026-04-16)Kimi K2.6: 53.5 (2026-04-20)Ling-2.6-flash: 27.1 (2026-04-21)MiMo-V2.5-Pro: 50.2 (2026-04-22)MiMo-V2.5: 43.1 (2026-04-22)Hy3: 41.2 (2026-04-22)GPT-5.5: 56.1 (2026-04-23)Ling-2.6-1T: 37 (2026-04-23)DeepSeek-V4-Pro: 50 (2026-04-24)DeepSeek-V4-Flash: 44.9 (2026-04-24)Qwen3.6 Max: 46.9 (2026-04-27)Qwen3.6 27B: 39.8 (2026-04-27)Qwen3.6 35B A3B: 35.8 (2026-04-27)Nemotron 3 Nano Omni 30B A3B Reasoning: 27.8 (2026-04-29)Granite 4.1 30B: 25.8 (2026-04-29)Granite 4.1 3B: 11.9 (2026-04-29)Mistral Medium 3.5: 39.6 (2026-04-30)Granite 4.1 8B: 21.8 (2026-04-30)Grok 4.3: 47.3 (2026-05-06)Gemini 3.1 Flash Lite: 41.9 (2026-05-07)Ring-2.6-1T: 42.4 (2026-05-08)MiniCPM-V 4.6 1.3B: 2.1 (2026-05-11)JT-35B-Flash: 29.1 (2026-05-14)Gemini 3.5 Flash: 53.1 (2026-05-19)Qwen3.7 Max: 48.8 (2026-05-21)MiniCPM5-1B: 1.4 (2026-05-25)Claude Opus 4.8: 53.5 (2026-05-28)Claude 2: 19.4 (2023-07-11)Claude 2GPT-4 Turbo: 31.9 (2023-11-06)GPT-4 TurboGPT-4o: 36.6 (2024-05-13)GPT-4oDeepSeek R1 Distill Qwen 32B: 37.6 (2025-01-20)DeepSeek R1 Distill Qwen 32Bo3-mini: 39.9 (2025-01-31)Grok 3 mini Reasoning: 40.6 (2025-02-19)Gemini 2.5 Pro: 42.8 (2025-03-25)o4-mini: 46.5 (2025-04-16)Gemini 3 Pro: 56.1 (2025-11-18)Gemini 3 ProGemini 3.1 Pro: 58.9 (2026-02-19)

Ranking

#1Gemini 3.1 Pro58.9
#2GPT-5.456.6
#3Gemini 3 Pro56.1
#4GPT-5.556.1
#5GPT-5.2-Codex54.6
#6Claude Opus 4.754.5
#7Claude Opus 4.853.5
#8Kimi K2.653.5
#9GPT-5.3-Codex53.2
#10Gemini 3.5 Flash53.1
#11GPT-5.252.1
#12Claude Opus 4.651.9
#13Muse Spark51.5
#14Gemini 3 Flash50.6
#15MiMo-V2.5-Pro50.2
#16DeepSeek-V4-Pro50
#17GPT-5.4 mini49.9
#18Claude Opus 4.549.5
#19Kimi K2.549
#20Qwen3.7 Max48.8
#21Grok 4.347.3
#22MiniMax M2.747
#23Qwen3.6 Max46.9
#24Claude Sonnet 4.646.9
#25GPT-5.4 nano46.9
#26o4-mini46.5
#27GLM-546.2
#28Grok 445.7
#29Grok 4.20 0309 v245.6
#30GLM 4.745.1
#31DeepSeek-V4-Flash44.9
#32Grok 4.20 030944.7
#33Claude Sonnet 4.544.7
#34Grok 4.1 Fast44.2
#35Grok 4 Fast44.2
#36DeepSeek V3.2 Speciale44
#37GLM 5.143.8
#38GLM 5 Turbo43.6
#39GLM 5V Turbo43.5
#40Gemma 4 31B43.4
#41Claude Haiku 4.543.3
#42GPT-5.143.3
#43MiMo-V2.543.1
#44Qwen3 Max Thinking43.1
#45GPT-542.9
#46Gemini 2.5 Pro42.8
#47Nova 2.0 Pro42.7
#48GPT-5.1-Codex-Mini42.6
#49MiniMax M2.542.6
#50MiMo-V2-Pro42.5
#51Qwen3 235B A22B 250742.4
#52Ring-2.6-1T42.4
#53Kimi K2 Thinking42.4
#54Qwen3.5 397B A17B42
#55Qwen3.5-122B-A10B42
#56Gemini 3.1 Flash Lite41.9
#57Hy341.2
#58Cogito v2.141
#59GPT-5 mini41
#60o341
#61GPT-5 Codex40.9
#62Claude Opus 4.140.9
#63Claude Opus 440.9
#64Doubao Seed Code40.7
#65MiniMax M2.140.7
#66Qwen3.6 Plus40.7
#67Grok 3 mini Reasoning40.6
#68DeepSeek V3.1 Terminus40.6
#69Qwen3.5 Omni Plus40.5
#70GPT-4.1 Mini40.4
#71Step 3.5 Flash40.4
#72DeepSeek-R1-052840.3
#73Claude 3.7 Sonnet40.3
#74GPT-5.1-Codex40.2
#75Gemma 4 26B A4B40
#76Claude Sonnet 440
#77Qwen3 VL 235B A22B39.9
#78Qwen3 235B A22B39.9
#79DeepSeek V3.2 Exp39.9
#80o3-mini39.9
#81Qwen3.6 27B39.8
#82Mistral Medium 3.539.6
#83MiMo-V2-Omni-032739.5
#84Qwen3.5-27B39.5
#85MiMo-V2-Flash39.4
#86Gemini 2.5 Flash39.4
#87Magistral Medium 1.239.2
#88INTELLECT-339.1
#89DeepSeek-V3.139.1
#90gpt-oss-120b38.9
#91DeepSeek-V3.238.9
#92Qwen3-Next-80B-A3B38.8
#93Mercury 238.7
#94Step 3.5 Flash 260338.5
#95GLM-4.638.4
#96KAT-Coder-Pro V238.3
#97Qwen3 Max38.3
#98GPT-4.138.1
#99Mistral Small 438
#100MiniMax M1 40k37.8
#101Command A37.8
#102Qwen3.5-35B-A3B37.7
#103DeepSeek R1 Distill Qwen 32B37.6
#104ERNIE 5.0 Thinking37.5
#105MiniMax M1 80k37.4
#106Apriel-v1.6-15B-Thinker37.3
#107Ling-2.6-1T37
#108Nova 2 Lite36.9
#109Grok-336.8
#110Ring-1T36.7
#111MiMo-V2-Omni36.7
#112KAT-Coder-Pro V136.6
#113GPT-5 nano36.6
#114GPT-4o36.6
#115Claude 3.5 Sonnet36.6
#116Seed-OSS-36B-Instruct36.5
#117Nova 2.0 Omni36.2
#118Grok Code Fast 136.2
#119Mistral Large 336.2
#120Trinity Large Thinking36.1
#121MiniMax-M236.1
#122NVIDIA Nemotron 3 Super 120B A12B36
#123Qwen3-235B-A22B-Instruct-250736
#124Qwen3 Coder 480B A35B Instruct35.9
#125Qwen3 VL 235B A22B Instruct35.9
#126QwQ-32B35.8
#127DeepSeek-V3 032435.8
#128Qwen3.6 35B A3B35.8
#129o135.8
#130DeepSeek-R135.7
#131K-EXAONE35.6
#132Qwen3 32B35.4
#133DeepSeek-V335.4
#134Ling-1T35.2
#135Magistral Small 1.235.2
#136Apriel-v1.5-15B-Thinker34.8
#137Nemotron Cascade 2 30B A3B34.8
#138Llama Nemotron Super 49B v1.534.8
#139GLM-4.534.8
#140Llama 3.1 Nemotron Ultra 253B v134.7
#141Hermes 4 - Llama-3.1 405B34.6
#142Kimi K234.5
#143EXAONE 4.0 32B34.4
#144gpt-oss-20b34.4
#145Hermes 4 - Llama-3.1 70B34.1
#146Gemini 2.0 Flash34
#147Mistral Medium 3.133.8
#148Qwen2.5 Max33.7
#149GLM 4.7 Flash33.7
#150Qwen3 30B A3B 250733.3
#151Mi:dm K 2.5 Pro33.2
#152Mistral Medium 333.1
#153Devstral 233.1
#154Llama 4 Maverick33.1
#155K2 Think V233
#156Gemini 2.0 Flash Thinking32.9
#157o1-mini32.3
#158Qwen3 Coder Next32.3
#159GPT-4 Turbo31.9
#160Qwen3 14B31.6
#161ERNIE 4.5 300B A47B31.5
#162Gemini 2.0 Pro31.3
#163DeepSeek R1 Distill Llama 70B31.3
#164Step3 VL 10B31.1
#165Qwen3 VL 30B A3B Instruct30.8
#166Kimi K2 090530.7
#167Qwen3 Next 80B A3B Instruct30.7
#168Qwen3 Omni 30B A3B30.6
#169GLM 4.5 Air30.6
#170Qwen3 30B A3B 2507 Instruct30.4
#171GLM 4.6V30.4
#172Llama 3.1 Tulu3 405B30.2
#173Solar Pro 230.2
#174Qwen3 VL 32B Instruct30.1
#175Llama 3.1 405B Instruct29.9
#176Magistral Medium 129.7
#177NVIDIA Nemotron 3 Nano 30B A3B29.6
#178Grok29.5
#179Gemini 1.5 Pro29.5
#180Devstral Medium29.4
#181Olmo 3.1 32B Think29.3
#182Pixtral Large29.2
#183Mistral Large 229.2
#184JT-35B-Flash29.1
#185Ling-flash-2.028.9
#186Qwen3 VL 30B A3B28.8
#187Devstral Small 228.8
#188K2-V228.6
#189Olmo 3 32B Think28.6
#190Qwen3 VL 32B28.5
#191Qwen3 30B A3B28.5
#192Grok-228.5
#193LongCat Flash Lite28.4
#194HyperCLOVA X SEED Think28.4
#195Motif-2-12.7B-Reasoning28.2
#196Llama-3.3 Nemotron Super 49B v128.2
#197EXAONE 4.5 33B28
#198Nova Premier27.9
#199Nemotron 3 Nano Omni 30B A3B Reasoning27.8
#200Qwen3 Coder 30B A3B Instruct27.8
#201Qwen3.5-9B27.7
#202Claude 3.5 Haiku27.4
#203JT-MINI27.2
#204Qwen2.5 Coder 32B Instruct27.1
#205Ling-2.6-flash27.1
#206Solar Open 100B26.9
#207Reka Flash 326.7
#208Gemini 1.5 Flash26.7
#209Llama 3.1 70B Instruct26.7
#210Qwen2.5 72B Instruct26.7
#211Nanbeige4.1-3B26.6
#212Mistral Small 3.126.5
#213Mistral Small 3.226.4
#214Sarvam 105B26.4
#215NVIDIA Nemotron Nano 12B v2 VL26.2
#216Llama 3.3 70B Instruct26
#217Phi 426
#218GPT-4.1 Nano25.9
#219Granite 4.1 30B25.8
#220Qwen3 4B 250725.6
#221Qwen3.5 Omni Flash25.5
#222Gemini 2.0 Flash Lite25
#223Falcon-H1R-7B24.9
#224Solar Pro 324.7
#225Devstral Small24.5
#226Gemma 4 E4B24.4
#227Mistral Saba24.1
#228Magistral Small 124.1
#229Llama 3.2 90B Instruct24
#230DeepSeek R1 Distill Qwen 14B23.9
#231Mistral Small 323.6
#232Ministral 3 14B23.6
#233Llama 3.1 Nemotron 70B Instruct23.3
#234Claude 3 Opus23.3
#235Hermes 3 - Llama-3.1 70B23.1
#236Sonar22.9
#237Qwen2.5 32B Instruct22.9
#238Qwen2 72B Instruct22.9
#239Gemini 1.5 Flash 8B22.9
#240Claude 3 Sonnet22.9
#241GPT-4o-mini22.9
#242DeepHermes 3 - Mistral 24B22.8
#243Sonar Pro22.6
#244Qwen3 8B22.6
#245GLM 4.5V22.1
#246NVIDIA Nemotron Nano 9B V222
#247Qwen3 VL 8B21.9
#248Granite 4.1 8B21.8
#249Gemma 3 27B Instruct21.2
#250Olmo 3 7B Think21.2
#251Granite 4.0 H Small20.9
#252Gemma 4 E2B20.9
#253Nova Pro20.8
#254Mistral Large20.8
#255Ministral 3 8B20.8
#256DeepSeek R1 0528 Qwen3 8B20.4
#257Kimi Linear 48B A3B Instruct19.9
#258Claude 219.4
#259Gemini 2.5 Flash Lite19.3
#260Sarvam 30B19.2
#261Llama 3 70B Instruct18.9
#262Mixtral 8x22B Instruct18.8
#263Jamba Large 1.718.8
#264Qwen3 Omni 30B A3B Instruct18.6
#265Claude 3 Haiku18.6
#266Jamba 1.6 Large18.4
#267Claude 2.118.4
#268Qwen3.5 4B18.3
#269Qwen3 4B 2507 Instruct18.1
#270Sarvam M17.8
#271Tri-21B-Think17.8
#272Gemma 3 12B Instruct17.4
#273Qwen3 VL 8B Instruct17.4
#274Qwen3 VL 4B17.1
#275Llama 4 Scout17
#276Ring-flash-2.016.8
#277Qwen3 4B16.7
#278Olmo 3.1 32B Instruct16.7
#279NVIDIA Nemotron 3 Nano 4B16.4
#280Jamba 1.5 Large16.3
#281Mistral Small15.6
#282Qwen2.5 Turbo15.3
#283Qwen2.5-Coder 7B Instruct14.8
#284Ministral 3 3B14.4
#285DeepSeek Coder V2 Lite Instruct13.9
#286Nova Lite13.9
#287Qwen3 VL 4B Instruct13.7
#288Ling-mini-2.013.5
#289Molmo2-8B13.3
#290Llama 3.1 8B Instruct13.2
#291Granite 4.1 3B11.9
#292DeepSeek R1 Distill Llama 8B11.9
#293Llama 3 8B Instruct11.9
#294Granite 4.0 Micro11.9
#295DBRX Instruct11.8
#296Mistral Medium11.8
#297Llama 2 Chat 13B11.8
#298Command R+11.8
#299Gemini 1.0 Pro11.7
#300Llama 3.2 11B Instruct11.2
#301Phi-4-multimodal-instruct11
#302LFM2-24B-A2B10.9
#303Phi 4 Mini Instruct10.8
#304Olmo 3 7B Instruct10.3
#305Jamba 1.6 Mini10.1
#306Granite 3.3 8B10.1
#307Llama 3.1 Nemotron Nano 4B v1.110.1
#308Nova Micro9.4
#309Jamba 1.7 Mini9.3
#310Exaone 4.0 1.2B9.3
#311DeepHermes 3 - Llama-3.1 8B9.1
#312Phi-3 Mini Instruct 3.8B9
#313Granite 4.0 1B8.7
#314Gemma 3n E4B Instruct8.6
#315Granite 4.0 H 1B8.2
#316OLMo 2 32B8
#317Jamba 1.5 Mini8
#318Gemma 3 4B Instruct7.3
#319Qwen3.5 2B7.2
#320LFM 40B7.1
#321Qwen3 1.7B6.9
#322LFM2 8B A1B6.8
#323DeepSeek R1 Distill Qwen 1.5B6.6
#324Jamba Reasoning 3B5.9
#325Apertus 70B Instruct5.7
#326Gemma 3n E2B Instruct5.2
#327Llama 3.2 3B Instruct5.2
#328LFM2.5-1.2B-Thinking4.2
#329Qwen3 0.6B4.1
#330Apertus 8B Instruct4.1
#331OLMo 2 7B3.7
#332Tiny Aya Global3.6
#333Molmo 7B-D3.6
#334LFM2.5-VL-1.6B3
#335Qwen3.5 0.8B2.9
#336Mixtral 8x7B Instruct2.8
#337LFM2 1.2B2.5
#338LFM2 2.6B2.5
#339Mistral 7B Instruct2.4
#340LFM2.5-1.2B-Instruct2.3
#341MiniCPM-V 4.6 1.3B2.1
#342Granite 4.0 H 350M1.7
#343Llama 3.2 1B Instruct1.7
#344MiniCPM5-1B1.4
#345Granite 4.0 350M0.9
#346Gemma 3 1B Instruct0.7
#347Llama 2 Chat 7B0
#348Gemma 3 270M0

Related Coding benchmarks