298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Coding idx ↓	SciCode	Aider Polyglot Edit	MultiPL-E	MBPP	SWE-bench Pro	Aider Polyglot	LiveCodeBench	Terminal-Bench	SWE-bench Verified	HumanEval	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	DeepSeek-V4-Pro DeepSeek	67.6	50	—	—	—	—	—	93.5	46.2	80.6	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#2	GPT-5.2 OpenAI	67.1	52.1	—	—	—	—	—	89.4	47	80	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
#3	Gemini 3 Pro Google	66.4	56.1	—	—	—	—	—	91.7	41.7	76.2	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#4	Claude Opus 4.5 Anthropic	66.1	49.5	—	—	—	—	—	87.1	47	80.9	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#5	GPT-5 OpenAI	65.7	42.9	—	—	—	—	88	84.6	37.9	74.9	93.4	2025	—	llm	API only	—	2024	400K	100	2.00	$1.25	$10.00
#6	Claude Opus 4.7 Anthropic	65.5	54.5	—	—	—	—	—	—	54.5	87.6	—	2026	—	llm	API only	—	—	1M	49	1.42	$5.00	$25.00
#7	Gemini 3 Flash Google	64.5	50.6	—	—	—	—	—	90.8	38.6	78	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#8	Gemini 3.1 Pro Google	64.4	58.9	—	—	—	—	—	—	53.8	80.6	—	2026	—	multimodal	API only	—	—	1M	142	26.02	$2.00	$12.00
#9	o3 OpenAI	61.9	41	—	—	—	—	81.3	80.8	37.1	69.1	—	2025	—	llm	API only	—	2024	200K	50	20.00	$2.00	$8.00
Index 56.3 = (69.3 + 33.6 + 57.1 + 65.2 / 4) — equal-weighted mean of 4 components. General25% 69.3 SimpleQA— AA-LCR69.3 LongBench-v2— IFBench— Reasoning25% 33.6 GPQA Diamond87.7 Humanity’s Last Exam24.3 FrontierMath15.8 ARC-AGI-26.5 Coding25% 57.1 SWE-bench Verified69.1 Terminal-Bench37.1 Aider Polyglot81.3 SciCode41 Tool use & agents25% 65.2 TAU-bench Retail— τ²-bench80.7 BFCL— BrowseComp49.7 Full breakdown for o3
#10	Claude Sonnet 4.5 Anthropic	60.8	44.7	—	—	—	—	—	71.4	50	77.2	—	2025	—	llm	API only	—	2025	1M	42	0.40	$3.00	$15.00
#11	Grok 4 xAI	60.6	45.7	—	—	—	—	79.6	79	37.9	—	—	2025	—	llm	API only	—	2024	256K	100	0.70	$3.00	$15.00
#12	Claude Opus 4.6 Anthropic	60.4	51.9	—	—	—	—	—	—	48.5	80.8	95	2026	—	llm	API only	—	—	1M	48	1.65	$5.00	$25.00
#13	Gemini 2.5 Pro Google	60.4	42.8	72.7	—	—	—	76.5	80.1	26.5	63.8	—	2025	—	multimodal	API only	—	2025	1M	85	0.70	$1.25	$10.00
#14	Claude Sonnet 4.6 Anthropic	59.8	46.9	—	—	—	—	—	—	53	79.6	—	2026	—	llm	API only	—	—	1M	75	1.13	$3.00	$15.00
#15	GPT-5 Codex OpenAI	59.3	40.9	—	—	—	—	—	84	37.9	74.5	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#16	DeepSeek V3.2 Exp DeepSeek	58.8	39.9	—	—	—	—	74.5	74.1	37.7	67.8	—	2025	—	llm	Open weights	—	2025	164K	100	0.70	$0.27	$0.41
#17	MiniMax-M2 MiniMax	58.6	36.1	—	—	—	—	—	82.6	46.3	69.4	—	2025	—	llm	Open weights	230B (10B active)	—	205K	91	1.19	$0.26	$1.00
#18	GPT-5.1 OpenAI	58.5	43.3	—	—	—	—	—	86.8	45.5	—	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
Index 64.7 = (75.0 + 57.3 + 44.4 + 81.9 / 4) — equal-weighted mean of 4 components. General25% 75 SimpleQA— AA-LCR75 LongBench-v2— IFBench— Reasoning25% 57.3 GPQA Diamond88.1 Humanity’s Last Exam26.5 FrontierMath— ARC-AGI-2— Coding25% 44.4 SWE-bench Verified— Terminal-Bench45.5 Aider Polyglot— SciCode43.3 Tool use & agents25% 81.9 TAU-bench Retail— τ²-bench81.9 BFCL— BrowseComp— Full breakdown for GPT-5.1
#19	GPT-5.5 OpenAI	58.4	56.1	—	—	—	58.6	—	—	60.6	—	—	2026	—	llm	API only	—	2025	1.1M	67	0.97	$5.00	$30.00
#20	DeepSeek-V3.2 DeepSeek	57.7	38.9	—	—	—	—	70.2	86.2	35.6	—	—	2025	—	llm	Open weights	671B (37B active)	—	131K	—	—	$0.25	$0.38
#21	Claude Opus 4 Anthropic	57.6	40.9	—	—	—	—	72	63.6	39.2	72.5	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#22	Kimi K2 Thinking Moonshot AI	57.5	42.4	—	—	—	—	—	85.3	31.1	71.3	—	2025	—	llm	Open weights	1T (32B active)	—	262K	100	1.00	$0.60	$2.50
#23	GPT-5.4 OpenAI	57.3	56.6	—	—	—	57.7	—	—	57.6	—	—	2026	—	llm	API only	—	—	1.1M	84	0.63	$2.50	$15.00
#24	o4-mini OpenAI	57.1	46.5	58.2	—	—	—	68.9	85.9	15.2	68.1	—	2025	—	multimodal	API only	—	2024	200K	115	5.20	$1.10	$4.40
#25	DeepSeek V3.2 Speciale DeepSeek	56.1	44	—	—	—	—	—	89.6	34.8	—	—	2025	—	llm	Open weights	—	—	164K	—	—	$0.29	$0.43

Ranked on Coding. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.