299 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 31, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Coding idx ↓	SciCode	Aider Polyglot Edit	MultiPL-E	MBPP	SWE-bench Pro	Aider Polyglot	LiveCodeBench	Terminal-Bench	SWE-bench Verified	HumanEval	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	DeepSeek-V4-Pro DeepSeek	67.6	50	—	—	—	—	—	93.5	46.2	80.6	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#2	GPT-5.2 OpenAI	67.1	52.1	—	—	—	—	—	89.4	47	80	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
Index 65.8 = (72.7 + 60.2 + 59.7 + 70.4 / 4) — equal-weighted mean of 4 components. General25% 72.7 SimpleQA— AA-LCR72.7 LongBench-v2— IFBench— Reasoning25% 60.2 GPQA Diamond92.4 Humanity’s Last Exam35.4 FrontierMath— ARC-AGI-252.9 Coding25% 59.7 SWE-bench Verified80 Terminal-Bench47 Aider Polyglot— SciCode52.1 Tool use & agents25% 70.4 TAU-bench Retail— τ²-bench84.8 BFCL55.9 BrowseComp— Full breakdown for GPT-5.2
#3	Gemini 3 Pro Google	66.4	56.1	—	—	—	—	—	91.7	41.7	76.2	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#4	Claude Opus 4.5 Anthropic	66.1	49.5	—	—	—	—	—	87.1	47	80.9	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#5	GPT-5 OpenAI	65.7	42.9	—	—	—	—	88	84.6	37.9	74.9	93.4	2025	—	llm	API only	—	2024	400K	100	2.00	$1.25	$10.00
#6	Claude Opus 4.7 Anthropic	65.5	54.5	—	—	—	—	—	—	54.5	87.6	—	2026	—	llm	API only	—	—	1M	49	1.42	$5.00	$25.00
#7	Gemini 3 Flash Google	64.5	50.6	—	—	—	—	—	90.8	38.6	78	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#8	Gemini 3.1 Pro Google	64.4	58.9	—	—	—	—	—	—	53.8	80.6	—	2026	—	multimodal	API only	—	—	1M	142	26.02	$2.00	$12.00
#9	GPT-5 mini OpenAI	64.2	41	—	—	—	—	88	83.8	33.3	74.9	—	2025	—	llm	API only	—	2024	400K	200	1.00	$0.25	$2.00
Index 57.8 = (68.0 + 40.4 + 59.3 + 63.3 / 4) — equal-weighted mean of 4 components. General25% 68 SimpleQA— AA-LCR68 LongBench-v2— IFBench— Reasoning25% 40.4 GPQA Diamond82.3 Humanity’s Last Exam16.7 FrontierMath22.1 ARC-AGI-2— Coding25% 59.3 SWE-bench Verified74.9 Terminal-Bench33.3 Aider Polyglot88 SciCode41 Tool use & agents25% 63.3 TAU-bench Retail— τ²-bench71.1 BFCL55.5 BrowseComp— Full breakdown for GPT-5 mini
#10	o3 OpenAI	61.9	41	—	—	—	—	81.3	80.8	37.1	69.1	—	2025	—	llm	API only	—	2024	200K	50	20.00	$2.00	$8.00
#11	Claude Sonnet 4.5 Anthropic	60.8	44.7	—	—	—	—	—	71.4	50	77.2	—	2025	—	llm	API only	—	2025	1M	42	0.40	$3.00	$15.00
#12	Grok 4 xAI	60.6	45.7	—	—	—	—	79.6	79	37.9	—	—	2025	—	llm	API only	—	2024	256K	100	0.70	$3.00	$15.00
#13	Claude Opus 4.6 Anthropic	60.4	51.9	—	—	—	—	—	—	48.5	80.8	95	2026	—	llm	API only	—	—	1M	48	1.65	$5.00	$25.00
#14	Gemini 2.5 Pro Google	60.4	42.8	72.7	—	—	—	76.5	80.1	26.5	63.8	—	2025	—	multimodal	API only	—	2025	1M	85	0.70	$1.25	$10.00
#15	Claude Sonnet 4.6 Anthropic	59.8	46.9	—	—	—	—	—	—	53	79.6	—	2026	—	llm	API only	—	—	1M	75	1.13	$3.00	$15.00
#16	GPT-5 Codex OpenAI	59.3	40.9	—	—	—	—	—	84	37.9	74.5	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#17	GPT-5 nano OpenAI	59.2	36.6	—	—	—	—	88	78.9	17.4	74.9	—	2025	—	llm	API only	—	2024	400K	500	0.30	$0.05	$0.40
#18	Kimi K2.6 Moonshot AI	59.1	53.5	—	—	—	58.6	—	—	43.9	80.2	—	2026	—	llm	Open weights	1T (32B active)	—	262K	57	1.20	$0.68	$3.42
#19	DeepSeek V3.2 Exp DeepSeek	58.8	39.9	—	—	—	—	74.5	74.1	37.7	67.8	—	2025	—	llm	Open weights	—	2025	164K	100	0.70	$0.27	$0.41
#20	MiniMax-M2 MiniMax	58.6	36.1	—	—	—	—	—	82.6	46.3	69.4	—	2025	—	llm	Open weights	230B (10B active)	—	205K	91	1.19	$0.26	$1.00
#21	GPT-5.1 OpenAI	58.5	43.3	—	—	—	—	—	86.8	45.5	—	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
#22	GPT-5.5 OpenAI	58.4	56.1	—	—	—	58.6	—	—	60.6	—	—	2026	—	llm	API only	—	2025	1.1M	67	0.97	$5.00	$30.00
#23	DeepSeek-V3.2 DeepSeek	57.7	38.9	—	—	—	—	70.2	86.2	35.6	—	—	2025	—	llm	Open weights	671B (37B active)	—	131K	—	—	$0.25	$0.38
#24	Claude Opus 4 Anthropic	57.6	40.9	—	—	—	—	72	63.6	39.2	72.5	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#25	Kimi K2 Thinking Moonshot AI	57.5	42.4	—	—	—	—	—	85.3	31.1	71.3	—	2025	—	llm	Open weights	1T (32B active)	—	262K	100	1.00	$0.60	$2.50

Ranked on Coding. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.