Coding

MultiPL-E

MultiPL-E is a scalable and extensible system for translating unit test-driven code generation benchmarks to multiple programming languages.

Source

12Models

87.9Top score

72.8Median

MultiPL-E is a scalable and extensible system for translating unit test-driven code generation benchmarks to multiple programming languages. It extends HumanEval and MBPP Python benchmarks to 18 additional programming languages, enabling evaluation of neural code generation models across diverse programming paradigms and language features.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Qwen3-235B-A22B-Instruct-2507Alibaba	87.9
2	Qwen3 Next 80B A3B InstructAlibaba	87.8
3	Kimi K2-Instruct-0905Moonshot AI	85.7
4	Kimi K2 InstructMoonshot AI	85.7
5	Qwen2.5 32B InstructAlibaba	75.4
6	Qwen2.5 72B InstructAlibaba	75.1
7	Qwen2.5 14B InstructAlibaba	72.8
8	Qwen2.5 7B InstructAlibaba	70.4
9	Qwen2 72B InstructAlibaba	69.2
10	Qwen3 235B A22BAlibaba	65.9
11	Qwen2.5-Omni-7BAlibaba	65.8
12	Qwen2 7B InstructAlibaba	59.1

Related Coding benchmarks

LiveCodeBench282 HumanEval68 SWE-bench Verified51 MBPP31 Aider Polyglot21 Terminal-Bench15