AI Hub
All benchmarks
Coding

LiveCodeBench

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code.

282Models
93.5Top score
42.9Median

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

10075502502023202420252026Llama 2 Chat 13B: 9.8 (2023-07-18)Llama 2 Chat 70B: 9.8 (2023-07-18)Llama 2 Chat 7B: 0.2 (2023-07-18)Mistral 7B Instruct: 4.6 (2023-09-27)Claude 2.1: 19.5 (2023-11-21)Mistral Medium: 9.9 (2023-12-11)Mixtral 8x7B Instruct: 6.6 (2023-12-11)OpenChat 3.5: 11.5 (2023-12-18)Gemini 1.0 Pro: 11.6 (2024-02-15)Mistral Small: 14.1 (2024-02-26)Claude 3 Sonnet: 17.5 (2024-02-29)Claude 3 Opus: 27.9 (2024-03-04)Claude 3 Haiku: 15.4 (2024-03-13)Gemini 1.5 Flash 8B: 21.7 (2024-03-15)DBRX Instruct: 9.3 (2024-03-27)Command R+: 12.2 (2024-04-04)Mixtral 8x22B Instruct: 14.8 (2024-04-17)Llama 3 70B Instruct: 19.8 (2024-04-18)Llama 3 8B Instruct: 9.6 (2024-04-18)Phi-3 Mini Instruct 3.8B: 11.6 (2024-04-23)Gemini 1.5 Flash: 27.3 (2024-05-01)DeepSeek Coder V2 Lite Instruct: 15.8 (2024-06-17)Claude 3.5 Sonnet: 38.1 (2024-06-20)GPT-4o-mini: 23.4 (2024-07-18)Llama 3.1 405B Instruct: 30.5 (2024-07-23)Qwen2 7B Instruct: 26.6 (2024-07-23)Llama 3.1 70B Instruct: 23.2 (2024-07-23)Qwen2 72B Instruct: 15.9 (2024-07-23)Llama 3.1 8B Instruct: 11.6 (2024-07-23)Mistral Large 2: 29.3 (2024-07-24)Grok-2: 26.7 (2024-08-13)Grok: 24.1 (2024-08-13)Hermes 3 - Llama-3.1 70B: 18.8 (2024-08-15)Jamba 1.5 Large: 14.3 (2024-08-22)Jamba 1.5 Mini: 6.2 (2024-08-22)Qwen2.5 72B Instruct: 55.5 (2024-09-19)Qwen2.5 32B Instruct: 24.8 (2024-09-19)Qwen2.5-Coder 7B Instruct: 18.2 (2024-09-19)Llama 3.2 90B Instruct: 21.4 (2024-09-25)Llama 3.2 11B Instruct: 11 (2024-09-25)Llama 3.2 3B Instruct: 8.3 (2024-09-25)Molmo 7B-D: 3.9 (2024-09-25)Llama 3.2 1B Instruct: 1.9 (2024-09-25)LFM 40B: 9.6 (2024-09-30)Llama 3.1 Nemotron 70B Instruct: 16.9 (2024-10-01)Qwen2.5 7B Instruct: 28.7 (2024-10-16)Claude 3.5 Haiku: 31.4 (2024-11-04)Qwen2.5 Coder 32B Instruct: 31.4 (2024-11-11)Qwen2.5 Turbo: 16.3 (2024-11-18)Pixtral Large: 26.1 (2024-11-19)Mistral Large: 17.8 (2024-11-19)Nova Pro: 23.3 (2024-11-20)Nova Lite: 16.7 (2024-11-20)Nova Micro: 14 (2024-11-20)OLMo 2 7B: 4.1 (2024-11-26)QwQ-32B-Preview: 50 (2024-11-28)Llama 3.3 70B Instruct: 28.8 (2024-12-06)Gemini 2.0 Flash: 35.1 (2024-12-11)DeepSeek-V3: 37.6 (2024-12-26)Phi 4: 23.1 (2025-01-10)DeepSeek-R1: 61.7 (2025-01-20)DeepSeek R1 Distill Llama 70B: 57.5 (2025-01-20)DeepSeek R1 Distill Qwen 32B: 57.2 (2025-01-20)DeepSeek R1 Distill Qwen 14B: 53.1 (2025-01-20)DeepSeek R1 Zero: 50 (2025-01-20)DeepSeek R1 Distill Llama 8B: 39.6 (2025-01-20)DeepSeek R1 Distill Qwen 7B: 37.6 (2025-01-20)DeepSeek R1 Distill Qwen 1.5B: 16.9 (2025-01-20)Gemini 2.0 Flash Thinking: 32.1 (2025-01-21)Sonar: 29.5 (2025-01-27)Qwen2.5 Max: 35.9 (2025-01-28)Llama 3.1 Tulu3 405B: 29.1 (2025-01-30)Mistral Small 3: 25.2 (2025-01-30)Phi-4-multimodal-instruct: 13.1 (2025-02-01)Gemini 2.0 Pro: 34.7 (2025-02-05)DeepHermes 3 - Llama-3.1 8B: 8.5 (2025-02-13)Grok-3: 79.4 (2025-02-17)Grok 3 mini Reasoning: 69.6 (2025-02-19)Claude 3.7 Sonnet: 47.3 (2025-02-24)Gemini 2.0 Flash Lite: 18.5 (2025-02-25)QwQ-32B: 63.4 (2025-03-05)Jamba 1.6 Large: 17.2 (2025-03-06)Jamba 1.6 Mini: 7.1 (2025-03-06)Sonar Pro: 27.5 (2025-03-07)Reka Flash 3: 43.5 (2025-03-12)Gemma 3 27B: 29.7 (2025-03-12)Gemma 3 12B Instruct: 13.7 (2025-03-12)Gemma 3 27B Instruct: 13.7 (2025-03-12)Gemma 3 4B Instruct: 11.2 (2025-03-12)Gemma 3 1B: 1.9 (2025-03-12)Command A: 28.7 (2025-03-13)Gemma 3 12B: 24.6 (2025-03-13)DeepHermes 3 - Mistral 24B: 19.5 (2025-03-13)Gemma 3 4B: 12.6 (2025-03-13)OLMo 2 32B: 6.8 (2025-03-13)Gemma 3 1B Instruct: 1.7 (2025-03-13)Mistral Small 3.1: 21.2 (2025-03-17)Llama-3.3 Nemotron Super 49B v1: 28 (2025-03-18)Gemini 2.5 Pro: 80.1 (2025-03-25)DeepSeek-V3 0324: 49.2 (2025-03-25)Llama 4 Maverick: 43.4 (2025-04-05)Llama 4 Scout: 32.8 (2025-04-05)Llama 3.1 Nemotron Ultra 253B v1: 66.3 (2025-04-07)GPT-4.1 Mini: 48.3 (2025-04-14)GPT-4.1: 45.7 (2025-04-14)GPT-4.1 Nano: 32.6 (2025-04-14)o3: 80.8 (2025-04-16)Granite 3.3 8B: 12.7 (2025-04-16)Gemini 2.5 Flash: 69.5 (2025-04-17)Qwen3 235B A22B: 70.7 (2025-04-28)Qwen3 32B: 65.7 (2025-04-28)Qwen3 30B A3B: 62.6 (2025-04-28)Qwen3 14B: 52.3 (2025-04-28)Qwen3 4B: 46.5 (2025-04-28)Qwen3 8B: 40.6 (2025-04-28)Qwen3 1.7B: 30.8 (2025-04-28)Qwen3 0.6B: 12.1 (2025-04-28)Phi 4 Reasoning: 53.8 (2025-04-30)Phi 4 Reasoning Plus: 53.1 (2025-04-30)Nova Premier: 31.7 (2025-04-30)Mistral Medium 3: 40 (2025-05-07)Solar Pro 2: 61.6 (2025-05-20)Llama 3.1 Nemotron Nano 4B v1.1: 49.3 (2025-05-20)Gemini Diffusion: 30.9 (2025-05-20)Gemma 3n E4B Instruct: 14.6 (2025-05-20)Gemma 3n E4B Instructed LiteRT Preview: 13.2 (2025-05-20)Gemma 3n E2B Instructed LiteRT (Preview): 13.2 (2025-05-20)Devstral Small: 25.8 (2025-05-21)Claude Sonnet 4: 65.5 (2025-05-22)Claude Opus 4: 63.6 (2025-05-22)Sarvam M: 29.5 (2025-05-23)DeepSeek-R1-0528: 73.3 (2025-05-28)DeepSeek R1 0528 Qwen3 8B: 51.3 (2025-05-29)Gemini 2.5 Pro Preview 06-05: 69 (2025-06-05)Magistral Medium 1: 52.7 (2025-06-10)Magistral Small 1: 51.4 (2025-06-10)Magistral Small 2506: 51.3 (2025-06-10)Magistral Medium: 50.3 (2025-06-10)MiniMax M1 80k: 71.1 (2025-06-17)MiniMax M1 40k: 65.7 (2025-06-17)Mistral Small 3.2: 27.5 (2025-06-20)Gemma 3n E4B Instructed: 13.2 (2025-06-26)Gemma 3n E2B Instructed: 13.2 (2025-06-26)Gemma 3n E2B Instruct: 9.5 (2025-06-26)ERNIE 4.5 300B A47B: 46.7 (2025-06-30)Jamba 1.7 Mini: 6.1 (2025-07-07)Grok-4 Heavy: 79.4 (2025-07-09)Grok 4: 79 (2025-07-09)Devstral Medium: 33.7 (2025-07-10)LFM2 1.2B: 2 (2025-07-10)Kimi K2: 55.6 (2025-07-11)EXAONE 4.0 32B: 74.7 (2025-07-15)Exaone 4.0 1.2B: 51.6 (2025-07-15)Qwen3 Coder 480B A35B Instruct: 58.5 (2025-07-22)Qwen3-235B-A22B-Instruct-2507: 52.4 (2025-07-22)Gemini 2.5 Flash Lite: 33.7 (2025-07-22)Qwen3 235B A22B 2507: 78.8 (2025-07-25)Llama Nemotron Super 49B v1.5: 73.7 (2025-07-25)GLM 4.5 Air: 70.7 (2025-07-25)GLM-4.5: 72.9 (2025-07-28)Qwen3 30B A3B 2507 Instruct: 51.5 (2025-07-29)Qwen3 30B A3B 2507: 70.7 (2025-07-30)Qwen3 Coder 30B A3B Instruct: 40.3 (2025-07-31)gpt-oss-20b: 77.7 (2025-08-05)Claude Opus 4.1: 65.4 (2025-08-05)Qwen3 4B 2507: 64.1 (2025-08-06)Qwen3 4B 2507 Instruct: 37.7 (2025-08-06)GPT-5: 84.6 (2025-08-07)GPT-5 mini: 83.8 (2025-08-07)GPT-5 nano: 78.9 (2025-08-07)Jamba Large 1.7: 18.1 (2025-08-08)GLM 4.5V: 60.4 (2025-08-11)Mistral Medium 3.1: 40.6 (2025-08-13)Gemma 3 270M: 0.3 (2025-08-14)NVIDIA Nemotron Nano 9B V2: 72.4 (2025-08-18)Seed-OSS-36B-Instruct: 76.5 (2025-08-20)DeepSeek-V3.1: 56.4 (2025-08-21)Hermes 4 - Llama-3.1 405B: 68.6 (2025-08-27)Hermes 4 - Llama-3.1 70B: 65.3 (2025-08-27)Grok Code Fast 1: 65.7 (2025-08-28)Nemotron Nano 9B V2: 71.1 (2025-09-05)Kimi K2 0905: 61 (2025-09-05)Kimi K2-Instruct-0905: 53.7 (2025-09-05)Gemini 2.5 Flash-Lite: 68.8 (2025-09-08)Ling-mini-2.0: 42.9 (2025-09-09)Qwen3-Next-80B-A3B: 78.4 (2025-09-10)Qwen3 Next 80B A3B Instruct: 68.4 (2025-09-11)Magistral Small 1.2: 72.3 (2025-09-17)Ling-flash-2.0: 58.9 (2025-09-17)Magistral Medium 1.2: 75 (2025-09-18)Grok 4 Fast: 80 (2025-09-19)Ring-flash-2.0: 62.8 (2025-09-19)DeepSeek V3.1 Terminus: 79.8 (2025-09-22)Qwen3 Omni 30B A3B: 67.9 (2025-09-22)Qwen3 Omni 30B A3B Instruct: 42.2 (2025-09-22)Granite 4.0 H Small: 25.1 (2025-09-22)GPT-5 Codex: 84 (2025-09-23)Qwen3 Max: 76.7 (2025-09-23)Qwen3 VL 235B A22B: 64.6 (2025-09-23)Qwen3 VL 235B A22B Instruct: 59.4 (2025-09-23)LFM2 2.6B: 8.1 (2025-09-23)Gemini 2.5 Flash: 71.3 (2025-09-25)DeepSeek V3.2 Exp: 74.1 (2025-09-29)Claude Sonnet 4.5: 71.4 (2025-09-29)Apriel-v1.5-15B-Thinker: 72.8 (2025-09-30)GLM-4.6: 69.5 (2025-09-30)Qwen3 VL 30B A3B: 69.7 (2025-10-03)Qwen3 VL 30B A3B Instruct: 47.6 (2025-10-06)LFM2 8B A1B: 15.1 (2025-10-07)Ling-1T: 67.7 (2025-10-08)Jamba Reasoning 3B: 21 (2025-10-08)Ring-1T: 64.3 (2025-10-13)Qwen3 VL 8B: 35.3 (2025-10-14)Qwen3 VL 8B Instruct: 33.2 (2025-10-14)Qwen3 VL 4B: 32 (2025-10-14)Qwen3 VL 4B Instruct: 29 (2025-10-14)Claude Haiku 4.5: 61.5 (2025-10-15)Phi 4 Mini Instruct: 12.6 (2025-10-17)Granite 4.0 Micro: 18 (2025-10-20)Qwen3 VL 32B: 73.8 (2025-10-21)Qwen3 VL 32B Instruct: 51.4 (2025-10-23)MiniMax-M2: 82.6 (2025-10-27)NVIDIA Nemotron Nano 12B v2 VL: 69.4 (2025-10-28)Granite 4.0 H 1B: 11.5 (2025-10-28)Granite 4.0 1B: 4.7 (2025-10-28)Granite 4.0 350M: 2.4 (2025-10-28)Granite 4.0 H 350M: 1.9 (2025-10-28)Kimi Linear 48B A3B Instruct: 37.8 (2025-10-30)Kimi K2 Thinking: 85.3 (2025-11-06)Doubao Seed Code: 76.6 (2025-11-11)KAT-Coder-Pro V1: 74.7 (2025-11-11)GPT-5.1: 86.8 (2025-11-12)GPT-5.1-Codex: 84.9 (2025-11-13)GPT-5.1-Codex-Mini: 83.6 (2025-11-13)ERNIE 5.0 Thinking: 81.2 (2025-11-13)Cogito v2.1: 68.8 (2025-11-18)Grok 4.1 Fast: 82.2 (2025-11-19)Olmo 3 7B Think: 61.7 (2025-11-20)Olmo 3 7B Instruct: 26.6 (2025-11-20)Olmo 3 32B Think: 67.2 (2025-11-21)Claude Opus 4.5: 87.1 (2025-11-24)Apriel-v1.6-15B-Thinker: 80.7 (2025-11-25)Nova 2.0 Omni: 66 (2025-11-26)INTELLECT-3: 77.7 (2025-11-27)Nova 2.0 Pro: 73 (2025-11-27)DeepSeek V3.2 Speciale: 89.6 (2025-12-01)DeepSeek-V3.2: 86.2 (2025-12-01)Nova 2 Lite: 71.1 (2025-12-02)Mistral Large 3: 46.5 (2025-12-02)Ministral 3 14B: 35.1 (2025-12-02)Ministral 3 8B: 30.3 (2025-12-02)Ministral 3 3B: 24.7 (2025-12-02)Motif-2-12.7B-Reasoning: 65.1 (2025-12-04)K2-V2: 69.4 (2025-12-05)GLM 4.6V: 41.1 (2025-12-08)Devstral 2: 44.8 (2025-12-09)Devstral Small 2: 34.8 (2025-12-09)GPT-5.2: 89.4 (2025-12-11)Mi:dm K 2.5 Pro: 65.6 (2025-12-11)Olmo 3.1 32B Think: 69.5 (2025-12-12)MiMo-V2-Flash: 86.8 (2025-12-14)NVIDIA Nemotron 3 Nano 30B A3B: 74.1 (2025-12-15)Gemini 3 Flash: 90.8 (2025-12-17)GLM 4.7: 89.4 (2025-12-22)MiniMax M2.1: 81 (2025-12-23)HyperCLOVA X SEED Think: 62.9 (2025-12-26)K-EXAONE: 76.8 (2025-12-31)Falcon-H1R-7B: 72.4 (2026-01-04)Qwen3 Max Thinking: 53.5 (2026-02-09)Claude Instant: 10.9 (2023-03-14)Claude InstantClaude 2: 17.1 (2023-07-11)Claude 2GPT-4 Turbo: 29.1 (2023-11-06)GPT-4 TurboGemini 1.5 Pro: 31.6 (2024-02-15)Gemini 1.5 ProGPT-4o: 42.5 (2024-05-13)GPT-4oo1-mini: 57.6 (2024-09-12)o1-minio1: 67.9 (2024-12-05)o1o3-mini: 73.4 (2025-01-31)o3-miniGrok-3 Mini: 80.4 (2025-02-17)Grok-3 Minio4-mini: 85.9 (2025-04-16)o4-minigpt-oss-120b: 87.8 (2025-08-05)gpt-oss-120bGemini 3 Pro: 91.7 (2025-11-18)Gemini 3 ProDeepSeek-V4-Pro: 93.5 (2026-04-24)DeepSeek-V4-Pro

Ranking

1DeepSeek-V4-Pro
93.5
2Gemini 3 Pro
91.7
3Gemini 3 Flash
90.8
4DeepSeek V3.2 Speciale
89.6
5GLM 4.7
89.4
6GPT-5.2
89.4
7gpt-oss-120b
87.8
8Claude Opus 4.5
87.1
9MiMo-V2-Flash
86.8
10GPT-5.1
86.8
11DeepSeek-V3.2
86.2
12o4-mini
85.9
13Kimi K2 Thinking
85.3
14GPT-5.1-Codex
84.9
15GPT-5
84.6
16GPT-5 Codex
84
17GPT-5 mini
83.8
18GPT-5.1-Codex-Mini
83.6
19MiniMax-M2
82.6
20Grok 4.1 Fast
82.2
21ERNIE 5.0 Thinking
81.2
22MiniMax M2.1
81
23o3
80.8
24Apriel-v1.6-15B-Thinker
80.7
25Grok-3 Mini
80.4
26Gemini 2.5 Pro
80.1
27Grok 4 Fast
80
28DeepSeek V3.1 Terminus
79.8
29Grok-4 Heavy
79.4
30Grok-3
79.4
31Grok 4
79
32GPT-5 nano
78.9
33Qwen3 235B A22B 2507
78.8
34Qwen3-Next-80B-A3B
78.4
35INTELLECT-3
77.7
36gpt-oss-20b
77.7
37K-EXAONE
76.8
38Qwen3 Max
76.7
39Doubao Seed Code
76.6
40Seed-OSS-36B-Instruct
76.5
41Magistral Medium 1.2
75
42KAT-Coder-Pro V1
74.7
43EXAONE 4.0 32B
74.7
44NVIDIA Nemotron 3 Nano 30B A3B
74.1
45DeepSeek V3.2 Exp
74.1
46Qwen3 VL 32B
73.8
47Llama Nemotron Super 49B v1.5
73.7
48o3-mini
73.4
49DeepSeek-R1-0528
73.3
50Nova 2.0 Pro
73
51GLM-4.5
72.9
52Apriel-v1.5-15B-Thinker
72.8
53NVIDIA Nemotron Nano 9B V2
72.4
54Falcon-H1R-7B
72.4
55Magistral Small 1.2
72.3
56Claude Sonnet 4.5
71.4
57Gemini 2.5 Flash
71.3
58MiniMax M1 80k
71.1
59Nemotron Nano 9B V2
71.1
60Nova 2 Lite
71.1
61Qwen3 30B A3B 2507
70.7
62Qwen3 235B A22B
70.7
63GLM 4.5 Air
70.7
64Qwen3 VL 30B A3B
69.7
65Grok 3 mini Reasoning
69.6
66Olmo 3.1 32B Think
69.5
67GLM-4.6
69.5
68Gemini 2.5 Flash
69.5
69K2-V2
69.4
70NVIDIA Nemotron Nano 12B v2 VL
69.4
71Gemini 2.5 Pro Preview 06-05
69
72Gemini 2.5 Flash-Lite
68.8
73Cogito v2.1
68.8
74Hermes 4 - Llama-3.1 405B
68.6
75Qwen3 Next 80B A3B Instruct
68.4
76Qwen3 Omni 30B A3B
67.9
77o1
67.9
78Ling-1T
67.7
79Olmo 3 32B Think
67.2
80Llama 3.1 Nemotron Ultra 253B v1
66.3
81Nova 2.0 Omni
66
82MiniMax M1 40k
65.7
83Grok Code Fast 1
65.7
84Qwen3 32B
65.7
85Mi:dm K 2.5 Pro
65.6
86Claude Sonnet 4
65.5
87Claude Opus 4.1
65.4
88Hermes 4 - Llama-3.1 70B
65.3
89Motif-2-12.7B-Reasoning
65.1
90Qwen3 VL 235B A22B
64.6
91Ring-1T
64.3
92Qwen3 4B 2507
64.1
93Claude Opus 4
63.6
94QwQ-32B
63.4
95HyperCLOVA X SEED Think
62.9
96Ring-flash-2.0
62.8
97Qwen3 30B A3B
62.6
98Olmo 3 7B Think
61.7
99DeepSeek-R1
61.7
100Solar Pro 2
61.6
101Claude Haiku 4.5
61.5
102Kimi K2 0905
61
103GLM 4.5V
60.4
104Qwen3 VL 235B A22B Instruct
59.4
105Ling-flash-2.0
58.9
106Qwen3 Coder 480B A35B Instruct
58.5
107o1-mini
57.6
108DeepSeek R1 Distill Llama 70B
57.5
109DeepSeek R1 Distill Qwen 32B
57.2
110DeepSeek-V3.1
56.4
111Kimi K2
55.6
112Qwen2.5 72B Instruct
55.5
113Phi 4 Reasoning
53.8
114Kimi K2-Instruct-0905
53.7
115Qwen3 Max Thinking
53.5
116Phi 4 Reasoning Plus
53.1
117DeepSeek R1 Distill Qwen 14B
53.1
118Magistral Medium 1
52.7
119Qwen3-235B-A22B-Instruct-2507
52.4
120Qwen3 14B
52.3
121Exaone 4.0 1.2B
51.6
122Qwen3 30B A3B 2507 Instruct
51.5
123Magistral Small 1
51.4
124Qwen3 VL 32B Instruct
51.4
125DeepSeek R1 0528 Qwen3 8B
51.3
126Magistral Small 2506
51.3
127Magistral Medium
50.3
128QwQ-32B-Preview
50
129DeepSeek R1 Zero
50
130Llama 3.1 Nemotron Nano 4B v1.1
49.3
131DeepSeek-V3 0324
49.2
132GPT-4.1 Mini
48.3
133Qwen3 VL 30B A3B Instruct
47.6
134Claude 3.7 Sonnet
47.3
135ERNIE 4.5 300B A47B
46.7
136Qwen3 4B
46.5
137Mistral Large 3
46.5
138GPT-4.1
45.7
139Devstral 2
44.8
140Reka Flash 3
43.5
141Llama 4 Maverick
43.4
142Ling-mini-2.0
42.9
143GPT-4o
42.5
144Qwen3 Omni 30B A3B Instruct
42.2
145GLM 4.6V
41.1
146Qwen3 8B
40.6
147Mistral Medium 3.1
40.6
148Qwen3 Coder 30B A3B Instruct
40.3
149Mistral Medium 3
40
150DeepSeek R1 Distill Llama 8B
39.6
151Claude 3.5 Sonnet
38.1
152Kimi Linear 48B A3B Instruct
37.8
153Qwen3 4B 2507 Instruct
37.7
154DeepSeek R1 Distill Qwen 7B
37.6
155DeepSeek-V3
37.6
156Qwen2.5 Max
35.9
157Qwen3 VL 8B
35.3
158Ministral 3 14B
35.1
159Gemini 2.0 Flash
35.1
160Devstral Small 2
34.8
161Gemini 2.0 Pro
34.7
162Devstral Medium
33.7
163Gemini 2.5 Flash Lite
33.7
164Qwen3 VL 8B Instruct
33.2
165Llama 4 Scout
32.8
166GPT-4.1 Nano
32.6
167Gemini 2.0 Flash Thinking
32.1
168Qwen3 VL 4B
32
169Nova Premier
31.7
170Gemini 1.5 Pro
31.6
171Qwen2.5 Coder 32B Instruct
31.4
172Claude 3.5 Haiku
31.4
173Gemini Diffusion
30.9
174Qwen3 1.7B
30.8
175Llama 3.1 405B Instruct
30.5
176Ministral 3 8B
30.3
177Gemma 3 27B
29.7
178Sarvam M
29.5
179Sonar
29.5
180Mistral Large 2
29.3
181Llama 3.1 Tulu3 405B
29.1
182GPT-4 Turbo
29.1
183Qwen3 VL 4B Instruct
29
184Llama 3.3 70B Instruct
28.8
185Command A
28.7
186Qwen2.5 7B Instruct
28.7
187Llama-3.3 Nemotron Super 49B v1
28
188Claude 3 Opus
27.9
189Mistral Small 3.2
27.5
190Sonar Pro
27.5
191Gemini 1.5 Flash
27.3
192Grok-2
26.7
193Olmo 3 7B Instruct
26.6
194Qwen2 7B Instruct
26.6
195Pixtral Large
26.1
196Devstral Small
25.8
197Mistral Small 3
25.2
198Granite 4.0 H Small
25.1
199Qwen2.5 32B Instruct
24.8
200Ministral 3 3B
24.7
201Gemma 3 12B
24.6
202Grok
24.1
203GPT-4o-mini
23.4
204Nova Pro
23.3
205Llama 3.1 70B Instruct
23.2
206Phi 4
23.1
207Gemini 1.5 Flash 8B
21.7
208Llama 3.2 90B Instruct
21.4
209Mistral Small 3.1
21.2
210Jamba Reasoning 3B
21
211Llama 3 70B Instruct
19.8
212Claude 2.1
19.5
213DeepHermes 3 - Mistral 24B
19.5
214Hermes 3 - Llama-3.1 70B
18.8
215Gemini 2.0 Flash Lite
18.5
216Qwen2.5-Coder 7B Instruct
18.2
217Jamba Large 1.7
18.1
218Granite 4.0 Micro
18
219Mistral Large
17.8
220Claude 3 Sonnet
17.5
221Jamba 1.6 Large
17.2
222Claude 2
17.1
223Llama 3.1 Nemotron 70B Instruct
16.9
224DeepSeek R1 Distill Qwen 1.5B
16.9
225Nova Lite
16.7
226Qwen2.5 Turbo
16.3
227Qwen2 72B Instruct
15.9
228DeepSeek Coder V2 Lite Instruct
15.8
229Claude 3 Haiku
15.4
230LFM2 8B A1B
15.1
231Mixtral 8x22B Instruct
14.8
232Gemma 3n E4B Instruct
14.6
233Jamba 1.5 Large
14.3
234Mistral Small
14.1
235Nova Micro
14
236Gemma 3 12B Instruct
13.7
237Gemma 3 27B Instruct
13.7
238Gemma 3n E4B Instructed LiteRT Preview
13.2
239Gemma 3n E4B Instructed
13.2
240Gemma 3n E2B Instructed LiteRT (Preview)
13.2
241Gemma 3n E2B Instructed
13.2
242Phi-4-multimodal-instruct
13.1
243Granite 3.3 8B
12.7
244Gemma 3 4B
12.6
245Phi 4 Mini Instruct
12.6
246Command R+
12.2
247Qwen3 0.6B
12.1
248Phi-3 Mini Instruct 3.8B
11.6
249Gemini 1.0 Pro
11.6
250Llama 3.1 8B Instruct
11.6
251OpenChat 3.5
11.5
252Granite 4.0 H 1B
11.5
253Gemma 3 4B Instruct
11.2
254Llama 3.2 11B Instruct
11
255Claude Instant
10.9
256Mistral Medium
9.9
257Llama 2 Chat 13B
9.8
258Llama 2 Chat 70B
9.8
259LFM 40B
9.6
260Llama 3 8B Instruct
9.6
261Gemma 3n E2B Instruct
9.5
262DBRX Instruct
9.3
263DeepHermes 3 - Llama-3.1 8B
8.5
264Llama 3.2 3B Instruct
8.3
265LFM2 2.6B
8.1
266Jamba 1.6 Mini
7.1
267OLMo 2 32B
6.8
268Mixtral 8x7B Instruct
6.6
269Jamba 1.5 Mini
6.2
270Jamba 1.7 Mini
6.1
271Granite 4.0 1B
4.7
272Mistral 7B Instruct
4.6
273OLMo 2 7B
4.1
274Molmo 7B-D
3.9
275Granite 4.0 350M
2.4
276LFM2 1.2B
2
277Granite 4.0 H 350M
1.9
278Gemma 3 1B
1.9
279Llama 3.2 1B Instruct
1.9
280Gemma 3 1B Instruct
1.7
281Gemma 3 270M
0.3
282Llama 2 Chat 7B
0.2

Related Coding benchmarks