Coding

LiveCodeBench

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code.

282Models

93.5Top score

42.9Median

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	DeepSeek-V4-ProDeepSeek	93.5
2	Gemini 3 ProGoogle	91.7
3	Gemini 3 FlashGoogle	90.8
4	DeepSeek V3.2 SpecialeDeepSeek	89.6
5	GLM 4.7Zhipu AI	89.4
6	GPT-5.2OpenAI	89.4
7	gpt-oss-120bOpenAI	87.8
8	Claude Opus 4.5Anthropic	87.1
9	MiMo-V2-FlashXiaomi	86.8
10	GPT-5.1OpenAI	86.8
11	DeepSeek-V3.2DeepSeek	86.2
12	o4-miniOpenAI	85.9
13	Kimi K2 ThinkingMoonshot AI	85.3
14	GPT-5.1-CodexOpenAI	84.9
15	GPT-5OpenAI	84.6
16	GPT-5 CodexOpenAI	84
17	GPT-5 miniOpenAI	83.8
18	GPT-5.1-Codex-MiniOpenAI	83.6
19	MiniMax-M2MiniMax	82.6
20	Grok 4.1 FastxAI	82.2
21	ERNIE 5.0 ThinkingBaidu	81.2
22	MiniMax M2.1MiniMax	81
23	o3OpenAI	80.8
24	Apriel-v1.6-15B-ThinkerServiceNow	80.7
25	Grok-3 MinixAI	80.4
26	Gemini 2.5 ProGoogle	80.1
27	Grok 4 FastxAI	80
28	DeepSeek V3.1 TerminusDeepSeek	79.8
29	Grok-4 HeavyxAI	79.4
30	Grok-3xAI	79.4
31	Grok 4xAI	79
32	GPT-5 nanoOpenAI	78.9
33	Qwen3 235B A22B 2507Alibaba	78.8
34	Qwen3-Next-80B-A3BAlibaba	78.4
35	INTELLECT-3Prime Intellect	77.7
36	gpt-oss-20bOpenAI	77.7
37	K-EXAONELG AI Research	76.8
38	Qwen3 MaxAlibaba	76.7
39	Doubao Seed CodeByteDance	76.6
40	Seed-OSS-36B-InstructByteDance	76.5
41	Magistral Medium 1.2Mistral AI	75
42	KAT-Coder-Pro V1Kuaishou	74.7
43	EXAONE 4.0 32BLG AI Research	74.7
44	NVIDIA Nemotron 3 Nano 30B A3BNVIDIA	74.1
45	DeepSeek V3.2 ExpDeepSeek	74.1
46	Qwen3 VL 32BAlibaba	73.8
47	Llama Nemotron Super 49B v1.5NVIDIA	73.7
48	o3-miniOpenAI	73.4
49	DeepSeek-R1-0528DeepSeek	73.3
50	Nova 2.0 ProAmazon	73
51	GLM-4.5Zhipu AI	72.9
52	Apriel-v1.5-15B-ThinkerServiceNow	72.8
53	NVIDIA Nemotron Nano 9B V2NVIDIA	72.4
54	Falcon-H1R-7BTII UAE	72.4
55	Magistral Small 1.2Mistral AI	72.3
56	Claude Sonnet 4.5Anthropic	71.4
57	Gemini 2.5 FlashGoogle	71.3
58	MiniMax M1 80kMiniMax	71.1
59	Nemotron Nano 9B V2NVIDIA	71.1
60	Nova 2 LiteAmazon	71.1
61	Qwen3 30B A3B 2507Alibaba	70.7
62	Qwen3 235B A22BAlibaba	70.7
63	GLM 4.5 AirZhipu AI	70.7
64	Qwen3 VL 30B A3BAlibaba	69.7
65	Grok 3 mini ReasoningxAI	69.6
66	Olmo 3.1 32B ThinkAllen Institute for AI	69.5
67	GLM-4.6Zhipu AI	69.5
68	Gemini 2.5 FlashGoogle	69.5
69	K2-V2MBZUAI Institute of Foundation Models	69.4
70	NVIDIA Nemotron Nano 12B v2 VLNVIDIA	69.4
71	Gemini 2.5 Pro Preview 06-05Google	69
72	Gemini 2.5 Flash-LiteGoogle	68.8
73	Cogito v2.1Deep Cogito	68.8
74	Hermes 4 - Llama-3.1 405BNous Research	68.6
75	Qwen3 Next 80B A3B InstructAlibaba	68.4
76	Qwen3 Omni 30B A3BAlibaba	67.9
77	o1OpenAI	67.9
78	Ling-1TInclusionAI	67.7
79	Olmo 3 32B ThinkAllen Institute for AI	67.2
80	Llama 3.1 Nemotron Ultra 253B v1NVIDIA	66.3
81	Nova 2.0 OmniAmazon	66
82	MiniMax M1 40kMiniMax	65.7
83	Grok Code Fast 1xAI	65.7
84	Qwen3 32BAlibaba	65.7
85	Mi:dm K 2.5 ProKorea Telecom	65.6
86	Claude Sonnet 4Anthropic	65.5
87	Claude Opus 4.1Anthropic	65.4
88	Hermes 4 - Llama-3.1 70BNous Research	65.3
89	Motif-2-12.7B-ReasoningMotif Technologies	65.1
90	Qwen3 VL 235B A22BAlibaba	64.6
91	Ring-1TInclusionAI	64.3
92	Qwen3 4B 2507Alibaba	64.1
93	Claude Opus 4Anthropic	63.6
94	QwQ-32BAlibaba	63.4
95	HyperCLOVA X SEED ThinkNaver	62.9
96	Ring-flash-2.0InclusionAI	62.8
97	Qwen3 30B A3BAlibaba	62.6
98	Olmo 3 7B ThinkAllen Institute for AI	61.7
99	DeepSeek-R1DeepSeek	61.7
100	Solar Pro 2Upstage	61.6
101	Claude Haiku 4.5Anthropic	61.5
102	Kimi K2 0905Moonshot AI	61
103	GLM 4.5VZhipu AI	60.4
104	Qwen3 VL 235B A22B InstructAlibaba	59.4
105	Ling-flash-2.0InclusionAI	58.9
106	Qwen3 Coder 480B A35B InstructAlibaba	58.5
107	o1-miniOpenAI	57.6
108	DeepSeek R1 Distill Llama 70BDeepSeek	57.5
109	DeepSeek R1 Distill Qwen 32BDeepSeek	57.2
110	DeepSeek-V3.1DeepSeek	56.4
111	Kimi K2Moonshot AI	55.6
112	Qwen2.5 72B InstructAlibaba	55.5
113	Phi 4 ReasoningMicrosoft	53.8
114	Kimi K2-Instruct-0905Moonshot AI	53.7
115	Qwen3 Max ThinkingAlibaba	53.5
116	Phi 4 Reasoning PlusMicrosoft	53.1
117	DeepSeek R1 Distill Qwen 14BDeepSeek	53.1
118	Magistral Medium 1Mistral AI	52.7
119	Qwen3-235B-A22B-Instruct-2507Alibaba	52.4
120	Qwen3 14BAlibaba	52.3
121	Exaone 4.0 1.2BLG AI Research	51.6
122	Qwen3 30B A3B 2507 InstructAlibaba	51.5
123	Magistral Small 1Mistral AI	51.4
124	Qwen3 VL 32B InstructAlibaba	51.4
125	DeepSeek R1 0528 Qwen3 8BDeepSeek	51.3
126	Magistral Small 2506Mistral AI	51.3
127	Magistral MediumMistral AI	50.3
128	QwQ-32B-PreviewAlibaba	50
129	DeepSeek R1 ZeroDeepSeek	50
130	Llama 3.1 Nemotron Nano 4B v1.1NVIDIA	49.3
131	DeepSeek-V3 0324DeepSeek	49.2
132	GPT-4.1 MiniOpenAI	48.3
133	Qwen3 VL 30B A3B InstructAlibaba	47.6
134	Claude 3.7 SonnetAnthropic	47.3
135	ERNIE 4.5 300B A47BBaidu	46.7
136	Qwen3 4BAlibaba	46.5
137	Mistral Large 3Mistral AI	46.5
138	GPT-4.1OpenAI	45.7
139	Devstral 2Mistral AI	44.8
140	Reka Flash 3Reka AI	43.5
141	Llama 4 MaverickMeta	43.4
142	Ling-mini-2.0InclusionAI	42.9
143	GPT-4oOpenAI	42.5
144	Qwen3 Omni 30B A3B InstructAlibaba	42.2
145	GLM 4.6VZhipu AI	41.1
146	Qwen3 8BAlibaba	40.6
147	Mistral Medium 3.1Mistral AI	40.6
148	Qwen3 Coder 30B A3B InstructAlibaba	40.3
149	Mistral Medium 3Mistral AI	40
150	DeepSeek R1 Distill Llama 8BDeepSeek	39.6
151	Claude 3.5 SonnetAnthropic	38.1
152	Kimi Linear 48B A3B InstructMoonshot AI	37.8
153	Qwen3 4B 2507 InstructAlibaba	37.7
154	DeepSeek R1 Distill Qwen 7BDeepSeek	37.6
155	DeepSeek-V3DeepSeek	37.6
156	Qwen2.5 MaxAlibaba	35.9
157	Qwen3 VL 8BAlibaba	35.3
158	Ministral 3 14BMistral AI	35.1
159	Gemini 2.0 FlashGoogle	35.1
160	Devstral Small 2Mistral AI	34.8
161	Gemini 2.0 ProGoogle	34.7
162	Devstral MediumMistral AI	33.7
163	Gemini 2.5 Flash LiteGoogle	33.7
164	Qwen3 VL 8B InstructAlibaba	33.2
165	Llama 4 ScoutMeta	32.8
166	GPT-4.1 NanoOpenAI	32.6
167	Gemini 2.0 Flash ThinkingGoogle	32.1
168	Qwen3 VL 4BAlibaba	32
169	Nova PremierAmazon	31.7
170	Gemini 1.5 ProGoogle	31.6
171	Qwen2.5 Coder 32B InstructAlibaba	31.4
172	Claude 3.5 HaikuAnthropic	31.4
173	Gemini DiffusionGoogle	30.9
174	Qwen3 1.7BAlibaba	30.8
175	Llama 3.1 405B InstructMeta	30.5
176	Ministral 3 8BMistral AI	30.3
177	Gemma 3 27BGoogle	29.7
178	Sarvam MSarvam	29.5
179	SonarPerplexity	29.5
180	Mistral Large 2Mistral AI	29.3
181	Llama 3.1 Tulu3 405BAllen Institute for AI	29.1
182	GPT-4 TurboOpenAI	29.1
183	Qwen3 VL 4B InstructAlibaba	29
184	Llama 3.3 70B InstructMeta	28.8
185	Command ACohere	28.7
186	Qwen2.5 7B InstructAlibaba	28.7
187	Llama-3.3 Nemotron Super 49B v1NVIDIA	28
188	Claude 3 OpusAnthropic	27.9
189	Mistral Small 3.2Mistral AI	27.5
190	Sonar ProPerplexity	27.5
191	Gemini 1.5 FlashGoogle	27.3
192	Grok-2xAI	26.7
193	Olmo 3 7B InstructAllen Institute for AI	26.6
194	Qwen2 7B InstructAlibaba	26.6
195	Pixtral LargeMistral AI	26.1
196	Devstral SmallMistral AI	25.8
197	Mistral Small 3Mistral AI	25.2
198	Granite 4.0 H SmallIBM	25.1
199	Qwen2.5 32B InstructAlibaba	24.8
200	Ministral 3 3BMistral AI	24.7
201	Gemma 3 12BGoogle	24.6
202	GrokxAI	24.1
203	GPT-4o-miniOpenAI	23.4
204	Nova ProAmazon	23.3
205	Llama 3.1 70B InstructMeta	23.2
206	Phi 4Microsoft	23.1
207	Gemini 1.5 Flash 8BGoogle	21.7
208	Llama 3.2 90B InstructMeta	21.4
209	Mistral Small 3.1Mistral AI	21.2
210	Jamba Reasoning 3BAI21 Labs	21
211	Llama 3 70B InstructMeta	19.8
212	Claude 2.1Anthropic	19.5
213	DeepHermes 3 - Mistral 24BNous Research	19.5
214	Hermes 3 - Llama-3.1 70BNous Research	18.8
215	Gemini 2.0 Flash LiteGoogle	18.5
216	Qwen2.5-Coder 7B InstructAlibaba	18.2
217	Jamba Large 1.7AI21 Labs	18.1
218	Granite 4.0 MicroIBM	18
219	Mistral LargeMistral AI	17.8
220	Claude 3 SonnetAnthropic	17.5
221	Jamba 1.6 LargeAI21 Labs	17.2
222	Claude 2Anthropic	17.1
223	Llama 3.1 Nemotron 70B InstructNVIDIA	16.9
224	DeepSeek R1 Distill Qwen 1.5BDeepSeek	16.9
225	Nova LiteAmazon	16.7
226	Qwen2.5 TurboAlibaba	16.3
227	Qwen2 72B InstructAlibaba	15.9
228	DeepSeek Coder V2 Lite InstructDeepSeek	15.8
229	Claude 3 HaikuAnthropic	15.4
230	LFM2 8B A1BLiquid AI	15.1
231	Mixtral 8x22B InstructMistral AI	14.8
232	Gemma 3n E4B InstructGoogle	14.6
233	Jamba 1.5 LargeAI21 Labs	14.3
234	Mistral SmallMistral AI	14.1
235	Nova MicroAmazon	14
236	Gemma 3 12B InstructGoogle	13.7
237	Gemma 3 27B InstructGoogle	13.7
238	Gemma 3n E4B Instructed LiteRT PreviewGoogle	13.2
239	Gemma 3n E4B InstructedGoogle	13.2
240	Gemma 3n E2B Instructed LiteRT (Preview)Google	13.2
241	Gemma 3n E2B InstructedGoogle	13.2
242	Phi-4-multimodal-instructMicrosoft	13.1
243	Granite 3.3 8BIBM	12.7
244	Gemma 3 4BGoogle	12.6
245	Phi 4 Mini InstructMicrosoft	12.6
246	Command R+Cohere	12.2
247	Qwen3 0.6BAlibaba	12.1
248	Phi-3 Mini Instruct 3.8BMicrosoft	11.6
249	Gemini 1.0 ProGoogle	11.6
250	Llama 3.1 8B InstructMeta	11.6
251	OpenChat 3.5OpenChat	11.5
252	Granite 4.0 H 1BIBM	11.5
253	Gemma 3 4B InstructGoogle	11.2
254	Llama 3.2 11B InstructMeta	11
255	Claude InstantAnthropic	10.9
256	Mistral MediumMistral AI	9.9
257	Llama 2 Chat 13BMeta	9.8
258	Llama 2 Chat 70BMeta	9.8
259	LFM 40BLiquid AI	9.6
260	Llama 3 8B InstructMeta	9.6
261	Gemma 3n E2B InstructGoogle	9.5
262	DBRX InstructDatabricks	9.3
263	DeepHermes 3 - Llama-3.1 8BNous Research	8.5
264	Llama 3.2 3B InstructMeta	8.3
265	LFM2 2.6BLiquid AI	8.1
266	Jamba 1.6 MiniAI21 Labs	7.1
267	OLMo 2 32BAllen Institute for AI	6.8
268	Mixtral 8x7B InstructMistral AI	6.6
269	Jamba 1.5 MiniAI21 Labs	6.2
270	Jamba 1.7 MiniAI21 Labs	6.1
271	Granite 4.0 1BIBM	4.7
272	Mistral 7B InstructMistral AI	4.6
273	OLMo 2 7BAllen Institute for AI	4.1
274	Molmo 7B-DAllen Institute for AI	3.9
275	Granite 4.0 350MIBM	2.4
276	LFM2 1.2BLiquid AI	2
277	Granite 4.0 H 350MIBM	1.9
278	Gemma 3 1BGoogle	1.9
279	Llama 3.2 1B InstructMeta	1.9
280	Gemma 3 1B InstructGoogle	1.7
281	Gemma 3 270MGoogle	0.3
282	Llama 2 Chat 7BMeta	0.2

Related Coding benchmarks

HumanEval68 SWE-bench Verified51 MBPP31 Aider Polyglot21 Terminal-Bench15 MultiPL-E12