Leaderboards

Model rankings

A balanced intelligence index averages each model's per-category scores. Drill into a category for individual benchmarks, or sort by speed, price, and context. See what changed → How this is calculated → Embed this leaderboard →

Updated May 25, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Report an error

Top intelligence

Sonar Reasoning Pro

95.7 index

Top reasoning

Claude Opus 4.7

94.2

Top coding

DeepSeek V3.2 Speciale

89.6

Top math

Grok-4 Heavy

100

Fastest

Llama 3.3 70B Instruct

2220 tok/s

Cheapest

Ling-2.6-flash

$0.01/M

Longest context

Llama 4 Scout

10M

Best open-weights

DeepSeek V3.2 Speciale

89.9 index

Price vs. intelligence

Intelligence index vs. input price — up and to the left is better value.

Speed vs. intelligence

Intelligence index vs. output speed — up and to the right is fast and smart.

Overview Reasoning Coding Math Agents Multimodal General Long Context

#	Model	Multi idx ↓	AI2D	MMMU-Pro	ChartQA	DocVQA	MathVista	MMMU	Context	Speed	In $/M
1	Claude 3.5 SonnetAnthropic	83.3	94.7	—	90.8	95.2	67.7	68.3	200K	101	$3.00
2	Gemma 3 27BGoogle	83	84.5	—	78	86.6	—	—	131K	33	$0.08
3	o4-miniOpenAI	82.9	—	—	—	—	84.3	81.6	200K	115	$1.10
4	Gemma 3 12BGoogle	82.3	84.2	—	75.7	87.1	—	—	131K	33	$0.04
5	Gemini 2.5 Pro Preview 06-05Google	82	—	—	—	—	—	82	1M	85	$1.25
6	o3OpenAI	82	—	76.4	—	—	86.8	82.9	200K	50	$2.00
7	Pixtral LargeMistral AI	81.7	93.8	—	88.1	93.3	69.4	64	131K	0	$2.00
8	Nova ProAmazon	81.5	—	—	89.2	93.5	—	61.7	300K	100	$0.80
9	GPT-5OpenAI	81.3	—	78.4	—	—	—	84.2	400K	100	$1.25
10	Mistral Small 3.2 24B InstructMistral AI	81	92.9	—	87.4	94.9	67.1	62.5	—	—	—
11	Llama 4 ScoutMeta	80.8	—	—	88.8	94.4	70.7	69.4	10M	776	$0.08
12	Gemini 2.5 FlashGoogle	79.7	—	—	—	—	—	79.7	1M	85	$0.30
13	Gemini 2.5 ProGoogle	79.6	—	—	—	—	—	79.6	1M	85	$1.25
14	Qwen2.5 VL 72B InstructAlibaba	79.1	88.4	51.1	89.5	96.4	—	70.2	131K	—	$0.25
15	Nova LiteAmazon	78.5	—	—	86.8	92.4	—	56.2	300K	100	$0.06
16	Llama 4 MaverickMeta	78.2	—	59.6	90	94.4	73.7	73.4	1M	639	$0.15
17	Grok-3xAI	78	—	—	—	—	—	78	128K	100	$3.00
18	GPT-4oOpenAI	77.7	94.2	59.9	85.7	92.8	61.4	72.2	128K	132	$2.50
19	Claude Opus 4.6Anthropic	77.3	—	77.3	—	—	—	—	1M	48	$5.00
20	Grok-2xAI	76.2	—	—	—	93.6	69	66.1	128K	85	$2.00
21	Gemini 2.0 Flash ThinkingGoogle	75.4	—	—	—	—	—	75.4	—	—	$0.00
22	Claude 3.7 SonnetAnthropic	75	—	—	—	—	—	75	200K	101	$3.00
23	DeepSeek VL2DeepSeek	74.9	81.4	—	86	93.3	62.8	51.1	129K	22	$9.50
24	Grok-2 minixAI	74.8	—	—	—	93.2	68.1	63.2	—	—	—
25	o1OpenAI	74.7	—	—	—	—	71.8	77.6	200K	66	$15.00
26	Claude Sonnet 4Anthropic	74.4	—	—	—	—	—	74.4	1M	101	$3.00
27	GPT-4.5OpenAI	73.8	—	—	—	—	72.3	75.2	128K	50	$75.00
28	GPT-4.1OpenAI	73.5	—	—	—	—	72.2	74.8	1M	100	$2.00
29	DeepSeek VL2 SmallDeepSeek	73.1	80	—	84.5	92.3	60.7	48	—	—	—
30	Gemma 3 4BGoogle	73.1	74.8	—	68.8	75.8	—	—	131K	33	$0.04
31	GPT-4.1 MiniOpenAI	72.9	—	—	—	—	73.1	72.7	1M	150	$0.40
32	Gemini 2.5 Flash LiteGoogle	72.9	—	—	—	—	—	72.9	1M	6	$0.10
33	Kimi-k1.5Moonshot AI	72.5	—	—	—	—	74.9	70	—	—	—
34	Llama 3.2 90B InstructMeta	71.8	92.3	45.2	85.5	90.1	57.3	60.3	128K	100	$0.35
35	Qwen2.5 VL 32B InstructAlibaba	71.4	—	49.5	—	94.8	—	70	—	—	—
36	Grok-1.5VxAI	71.3	88.3	—	76.1	85.6	52.8	53.6	—	—	—
37	Qwen2.5-Omni-7BAlibaba	71.2	83.2	36.6	85.3	95.2	67.9	59.2	—	—	—
38	QvQ-72B-PreviewAlibaba	70.9	—	—	—	—	71.4	70.3	—	—	—
39	Pixtral-12BMistral AI	70.8	—	—	81.8	90.7	58	52.5	128K	0	$0.15
40	Gemini 2.0 FlashGoogle	70.7	—	—	—	—	—	70.7	1M	183	$0.10
41	Qwen2.5 VL 7B InstructAlibaba	70	—	38.3	87.3	95.7	—	58.6	—	—	—
42	Phi-4-multimodal-instructMicrosoft	68.8	82.3	38.5	81.4	93.2	62.4	55.1	128K	25	$0.05
43	Gemini 2.0 Flash LiteGoogle	68	—	—	—	—	—	68	1M	85	$0.08
44	Qwen2-VL-72B-InstructAlibaba	67.3	—	46.2	88.3	—	—	—	—	—	—
45	DeepSeek VL2 TinyDeepSeek	67.2	71.6	—	81	88.9	53.6	40.7	—	—	—
46	Gemini 1.5 ProGoogle	67	—	—	—	—	68.1	65.9	2M	85	$1.25
47	Llama 3.2 11B InstructMeta	66.4	91.1	33	83.4	88.4	51.5	50.7	128K	168	$0.05
48	Gemini 1.5 FlashGoogle	64.1	—	—	—	—	65.8	62.3	1M	150	$0.15
49	Grok-1.5xAI	64	—	—	—	85.6	52.8	53.6	—	—	—
50	Phi-3.5-vision-instructMicrosoft	61.7	78.1	—	81.8	—	43.9	43	—	—	—
51	Mistral Small 3.1 24B BaseMistral AI	59.3	—	—	—	—	—	59.3	128K	137	$0.10
52	Mistral Small 3.1 24B InstructMistral AI	59.3	—	—	—	—	—	59.3	—	—	—
53	GPT-4o-miniOpenAI	58.1	—	—	—	—	56.7	59.4	128K	92	$0.15
54	GPT-4.1 NanoOpenAI	55.8	—	—	—	—	56.2	55.4	1M	200	$0.10
55	Gemini 1.5 Flash 8BGoogle	54.2	—	—	—	—	54.7	53.7	1M	150	$0.07
56	Gemini 1.0 ProGoogle	47.3	—	—	—	—	46.6	47.9	33K	120	$0.50
57	GPT-3.5 TurboOpenAI	0	—	—	—	—	0	0	16K	100	$0.50

57 models ranked on Multimodal. The intelligence index is a balanced mean of per-category scores; category columns average the benchmarks within each. Scores are curated approximations — see each model for sources. Click any column to sort.