298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 29, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Coding idx ↓	SciCode	Aider Polyglot Edit	MultiPL-E	MBPP	SWE-bench Pro	Aider Polyglot	LiveCodeBench	Terminal-Bench	SWE-bench Verified	HumanEval	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	DeepSeek-V4-Pro DeepSeek	67.6	50	—	—	—	—	—	93.5	46.2	80.6	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#2	GPT-5.2 OpenAI	67.1	52.1	—	—	—	—	—	89.4	47	80	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
#3	Gemini 3 Pro Google	66.4	56.1	—	—	—	—	—	91.7	41.7	76.2	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#4	Claude Opus 4.5 Anthropic	66.1	49.5	—	—	—	—	—	87.1	47	80.9	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#5	GPT-5 OpenAI	65.7	42.9	—	—	—	—	88	84.6	37.9	74.9	93.4	2025	—	llm	API only	—	2024	400K	100	2.00	$1.25	$10.00
Index 63.3 = (75.6 + 46.1 + 60.9 + 70.7 / 4) — equal-weighted mean of 4 components. General25% 75.6 SimpleQA— AA-LCR75.6 LongBench-v2— IFBench— Reasoning25% 46.1 GPQA Diamond87.3 Humanity’s Last Exam24.8 FrontierMath26.3 ARC-AGI-2— Coding25% 60.9 SWE-bench Verified74.9 Terminal-Bench37.9 Aider Polyglot88 SciCode42.9 Tool use & agents25% 70.7 TAU-bench Retail— τ²-bench86.5 BFCL— BrowseComp54.9 Full breakdown for GPT-5
#6	Claude Opus 4.7 Anthropic	65.5	54.5	—	—	—	—	—	—	54.5	87.6	—	2026	—	llm	API only	—	—	1M	49	1.42	$5.00	$25.00
#7	Gemini 3 Flash Google	64.5	50.6	—	—	—	—	—	90.8	38.6	78	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#8	Gemini 3.1 Pro Google	64.4	58.9	—	—	—	—	—	—	53.8	80.6	—	2026	—	multimodal	API only	—	—	1M	142	26.02	$2.00	$12.00
#9	o3 OpenAI	61.9	41	—	—	—	—	81.3	80.8	37.1	69.1	—	2025	—	llm	API only	—	2024	200K	50	20.00	$2.00	$8.00
#10	Claude Sonnet 4.5 Anthropic	60.8	44.7	—	—	—	—	—	71.4	50	77.2	—	2025	—	llm	API only	—	2025	1M	42	0.40	$3.00	$15.00
#11	Claude Opus 4.6 Anthropic	60.4	51.9	—	—	—	—	—	—	48.5	80.8	95	2026	—	llm	API only	—	—	1M	48	1.65	$5.00	$25.00
#12	Gemini 2.5 Pro Google	60.4	42.8	72.7	—	—	—	76.5	80.1	26.5	63.8	—	2025	—	multimodal	API only	—	2025	1M	85	0.70	$1.25	$10.00
#13	Claude Sonnet 4.6 Anthropic	59.8	46.9	—	—	—	—	—	—	53	79.6	—	2026	—	llm	API only	—	—	1M	75	1.13	$3.00	$15.00
#14	GPT-5 Codex OpenAI	59.3	40.9	—	—	—	—	—	84	37.9	74.5	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#15	DeepSeek V3.2 Exp DeepSeek	58.8	39.9	—	—	—	—	74.5	74.1	37.7	67.8	—	2025	—	llm	Open weights	—	2025	164K	100	0.70	$0.27	$0.41
#16	MiniMax-M2 MiniMax	58.6	36.1	—	—	—	—	—	82.6	46.3	69.4	—	2025	—	llm	Open weights	230B (10B active)	—	205K	91	1.19	$0.26	$1.00
#17	GPT-5.1 OpenAI	58.5	43.3	—	—	—	—	—	86.8	45.5	—	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
#18	GPT-5.5 OpenAI	58.4	56.1	—	—	—	58.6	—	—	60.6	—	—	2026	—	llm	API only	—	2025	1.1M	67	0.97	$5.00	$30.00
#19	Kimi K2 Thinking Moonshot AI	57.5	42.4	—	—	—	—	—	85.3	31.1	71.3	—	2025	—	llm	Open weights	1T (32B active)	—	262K	100	1.00	$0.60	$2.50
#20	GPT-5.4 OpenAI	57.3	56.6	—	—	—	57.7	—	—	57.6	—	—	2026	—	llm	API only	—	—	1.1M	84	0.63	$2.50	$15.00
#21	o4-mini OpenAI	57.1	46.5	58.2	—	—	—	68.9	85.9	15.2	68.1	—	2025	—	multimodal	API only	—	2024	200K	115	5.20	$1.10	$4.40
#22	DeepSeek V3.2 Speciale DeepSeek	56.1	44	—	—	—	—	—	89.6	34.8	—	—	2025	—	llm	Open weights	—	—	164K	—	—	$0.29	$0.43
#23	Claude Opus 4.1 Anthropic	56	40.9	—	—	—	—	—	65.4	43.3	74.5	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#24	Claude Opus 4.8New Anthropic	55.9	53.5	—	—	—	—	—	—	58.3	—	—	2026	—	llm	API only	—	—	1M	66	6.54	$5.00	$25.00
#25	GLM-5 Zhipu AI	55.7	46.2	—	—	—	—	—	—	43.2	77.8	—	2026	—	llm	Open weights	744B (44B active)	—	203K	67	0.77	$0.60	$1.92

Ranked on Coding. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.