Leaderboards

Model rankings

A balanced intelligence index averages each model's per-category scores. Drill into a category for individual benchmarks, or sort by speed, price, and context. See what changed → How this is calculated → Embed this leaderboard →

Updated May 25, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Report an error

Top intelligence

Sonar Reasoning Pro

95.7 index

Top reasoning

Claude Opus 4.7

94.2

Top coding

DeepSeek V3.2 Speciale

89.6

Top math

Grok-4 Heavy

100

Fastest

Llama 3.3 70B Instruct

2220 tok/s

Cheapest

Ling-2.6-flash

$0.01/M

Longest context

Llama 4 Scout

10M

Best open-weights

DeepSeek V3.2 Speciale

89.9 index

Price vs. intelligence

Intelligence index vs. input price — up and to the left is better value.

Speed vs. intelligence

Intelligence index vs. output speed — up and to the right is fast and smart.

Overview Reasoning Coding Math Agents Multimodal General Long Context

#	Model	Agents idx ↓	BFCL	τ²-bench Airline	τ²-bench Retail	BrowseComp	TAU-bench Airline	TAU-bench Retail	Context	Speed	In $/M
1	Llama 3.1 405B InstructMeta	88.5	88.5	—	—	—	—	—	128K	100	$0.89
2	Llama 3.1 70B InstructMeta	84.8	84.8	—	—	—	—	—	131K	1204	$0.40
3	Claude Sonnet 4.5Anthropic	78.1	—	—	—	—	70	86.2	1M	42	$3.00
4	Llama 3.1 8B InstructMeta	76.1	76.1	—	—	—	—	—	131K	2047	$0.02
5	Claude Haiku 4.5Anthropic	73.4	—	63.6	83.2	—	—	—	200K	100	$1.00
6	Qwen3 235B A22BAlibaba	70.8	70.8	—	—	—	—	—	131K	68	$0.46
7	Claude Opus 4Anthropic	70.5	—	—	—	—	59.6	81.4	200K	120	$15.00
8	Claude Sonnet 4Anthropic	70.3	—	—	—	—	60	80.5	1M	101	$3.00
9	Qwen3 32BAlibaba	70.3	70.3	—	—	—	—	—	131K	328	$0.08
10	Claude 3.7 SonnetAnthropic	69.8	—	—	—	—	58.4	81.2	200K	101	$3.00
11	Claude Opus 4.1Anthropic	69.2	—	—	—	—	56	82.4	200K	120	$15.00
12	Qwen3 30B A3BAlibaba	69.1	69.1	—	—	—	—	—	131K	122	$0.09
13	Nova ProAmazon	68.4	68.4	—	—	—	—	—	300K	100	$0.80
14	gpt-oss-120bOpenAI	67.8	—	—	—	—	—	67.8	131K	500	$0.04
15	Nova LiteAmazon	66.6	66.6	—	—	—	—	—	300K	100	$0.06
16	QwQ-32BAlibaba	66.4	66.4	—	—	—	—	—	—	31	$0.70
17	GPT-5OpenAI	66.2	—	62.6	81.1	54.9	—	—	400K	100	$1.25
18	o3OpenAI	64.9	—	64.8	80.2	49.7	—	—	200K	50	$2.00
19	Kimi K2 InstructMoonshot AI	63.6	—	56.5	70.6	—	—	—	131K	45	$0.57
20	Kimi K2-Instruct-0905Moonshot AI	63.6	—	56.5	70.6	—	—	—	—	—	—
21	Qwen3 Next 80B A3B ThinkingAlibaba	61.7	—	60.5	67.8	—	49	69.6	262K	—	$0.10
22	Qwen3-235B-A22B-Thinking-2507Alibaba	60.9	—	58	71.9	—	46	67.8	256K	—	$0.30
23	o1OpenAI	60.4	—	—	—	—	50	70.8	200K	66	$15.00
24	GPT-4.5OpenAI	59.2	—	—	—	—	50	68.4	128K	50	$75.00
25	GPT-4.1OpenAI	58.7	—	—	—	—	49.4	68	1M	100	$2.00
26	Qwen3-235B-A22B-Instruct-2507Alibaba	57.7	—	44	71.3	—	—	—	131K	63	$0.15
27	Claude 3.5 SonnetAnthropic	57.6	—	—	—	—	46	69.2	200K	101	$3.00
28	o4-miniOpenAI	57.5	—	—	—	51.5	49.2	71.8	200K	115	$1.10
29	Nova MicroAmazon	56.2	56.2	—	—	—	—	—	128K	100	$0.03
30	GLM-4.5Zhipu AI	55.5	—	—	—	26.4	60.4	79.7	131K	85	$0.60
31	gpt-oss-20bOpenAI	54.8	—	—	—	—	—	54.8	131K	1000	$0.03
32	GLM 4.5 AirZhipu AI	53.3	—	—	—	21.3	60.8	77.9	131K	63	$0.13
33	GPT-4oOpenAI	53	—	45.5	63.4	—	42.8	60.3	128K	132	$2.50
34	Qwen3 Next 80B A3B InstructAlibaba	51.9	—	45.5	57.3	—	44	60.9	262K	161	$0.09
35	GPT-4.1 MiniOpenAI	45.9	—	—	—	—	36	55.8	1M	150	$0.40
36	GLM-4.6Zhipu AI	45.1	—	—	—	45.1	—	—	203K	85	$0.43
37	o3-miniOpenAI	45	—	—	—	—	32.4	57.6	200K	115	$1.10
38	Grok 4 FastxAI	44.9	—	—	—	44.9	—	—	2M	90	$0.20
39	DeepSeek V3.2 ExpDeepSeek	40.1	—	—	—	40.1	—	—	164K	100	$0.27
40	Claude 3.5 HaikuAnthropic	36.9	—	—	—	—	22.8	51	200K	104	$0.80
41	DeepSeek-V3.1DeepSeek	30	—	—	—	30	—	—	164K	—	$0.21
42	GPT-4.1 NanoOpenAI	18.3	—	—	—	—	14	22.6	1M	200	$0.10
43	DeepSeek-R1-0528DeepSeek	8.9	—	—	—	8.9	—	—	131K	45	$0.55

43 models ranked on Agents. The intelligence index is a balanced mean of per-category scores; category columns average the benchmarks within each. Scores are curated approximations — see each model for sources. Click any column to sort.