General

LiveBench

13Models

84.6Top score

73.1Median

LiveBench is a challenging, contamination-limited LLM benchmark that addresses test set contamination by releasing new questions monthly based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. It comprises tasks across math, coding, reasoning, language, instruction following, and data analysis with verifiable, objective ground-truth answers.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	o3-miniOpenAI	84.6
2	Qwen3 235B A22BAlibaba	77.1
3	Kimi K2-Instruct-0905Moonshot AI	76.4
4	Kimi K2 InstructMoonshot AI	76.4
5	Qwen3 32BAlibaba	74.9
6	Qwen3 30B A3BAlibaba	74.3
7	QwQ-32BAlibaba	73.1
8	o1OpenAI	67
9	o1-previewOpenAI	52.3
10	Qwen2.5 72B InstructAlibaba	52.3
11	Phi 4Microsoft	47.6
12	Qwen2.5 7B InstructAlibaba	35.9
13	Qwen2.5-Omni-7BAlibaba	29.6

Related General benchmarks

Humanity’s Last Exam360 MMLU-Pro292 MMLU92 IFEval41 SimpleQA26 Arena Hard21