298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	General idx ↓	Multi-IF	LiveBench	Arena Hard	Humanity’s Last Exam	IFEval	SimpleQA	MMLU-Pro	MMLU	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	DeepSeek V3.2 Exp DeepSeek	91.1	—	—	—	19.8	—	97.1	85	—	2025	—	llm	Open weights	—	2025	164K	100	0.70	$0.27	$0.41
#2	Grok 4 Fast xAI	90	—	—	—	20	—	95	85	—	2025	—	llm	API only	—	—	2M	90	—	$0.20	$0.50
#3	Gemini 3 Pro Google	89.8	—	—	—	37.5	—	—	89.8	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#4	Claude Opus 4.5 Anthropic	89.5	—	—	—	28.4	—	—	89.5	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#5	Gemini 3 Flash Google	89	—	—	—	34.7	—	—	89	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#6	DeepSeek-R1-0528 DeepSeek	88.7	—	—	—	17.7	—	92.3	85	—	2025	—	llm	Open weights	671000000000	—	131K	45	0.30	$0.55	$2.19
#7	DeepSeek-V3.1 DeepSeek	88.6	—	—	—	15.9	—	93.4	83.7	—	2025	—	llm	Open weights	671B (37B active)	2025	164K	—	—	$0.21	$0.79
#8	Claude 3.7 Sonnet Anthropic	88.5	—	—	—	10.3	93.2	—	83.7	86.1	2025	—	llm	API only	—	—	200K	101	0.40	$3.00	$15.00
#9	Claude Opus 4.1 Anthropic	88	—	—	—	11.9	—	—	88	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#10	DeepSeek-V4-Pro DeepSeek	87.5	—	—	—	35.9	—	—	87.5	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#11	Claude Sonnet 4.5 Anthropic	87.5	—	—	—	17.3	—	—	87.5	—	2025	—	llm	API only	—	2025	1M	42	0.40	$3.00	$15.00
#12	MiniMax M2.1 MiniMax	87.5	—	—	—	22.2	—	—	87.5	—	2025	—	llm	Open weights	—	—	205K	92	1.14	$0.29	$0.95
#13	GPT-5.2 OpenAI	87.4	—	—	—	35.4	—	—	87.4	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
#14	Claude Opus 4 Anthropic	87.3	—	—	—	11.7	—	—	87.3	88.8	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
Index 50.7 = (36.0 + 33.3 + 56.2 + 77.4 / 4) — equal-weighted mean of 4 components. General25% 36 SimpleQA— AA-LCR36 LongBench-v2— IFBench— Reasoning25% 33.3 GPQA Diamond79.6 Humanity’s Last Exam11.7 FrontierMath— ARC-AGI-28.6 Coding25% 56.2 SWE-bench Verified72.5 Terminal-Bench39.2 Aider Polyglot72 SciCode40.9 Tool use & agents25% 77.4 TAU-bench Retail81.4 τ²-bench73.4 BFCL— BrowseComp— Full breakdown for Claude Opus 4
#15	GPT-5 OpenAI	87.1	—	—	—	24.8	—	—	87.1	92.5	2025	—	llm	API only	—	2024	400K	100	2.00	$1.25	$10.00
#16	GPT-5.1 OpenAI	87	—	—	—	26.5	—	—	87	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
#17	Grok 4 xAI	86.6	—	—	—	40	—	—	86.6	—	2025	—	llm	API only	—	2024	256K	100	0.70	$3.00	$15.00
#18	GPT-5 Codex OpenAI	86.5	—	—	—	25.6	—	—	86.5	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#19	DeepSeek V3.2 Speciale DeepSeek	86.3	—	—	—	26.1	—	—	86.3	—	2025	—	llm	Open weights	—	—	164K	—	—	$0.29	$0.43
#20	DeepSeek-V3.2 DeepSeek	86.2	—	—	—	22.2	—	—	86.2	—	2025	—	llm	Open weights	671B (37B active)	—	131K	—	—	$0.25	$0.38
#21	GPT-5.1-Codex OpenAI	86	—	—	—	23.4	—	—	86	—	2025	—	multimodal	API only	—	—	400K	188	4.16	$1.25	$10.00
#22	Llama 3.1 Nemotron Ultra 253B v1 NVIDIA	86	—	—	—	8.1	89.5	—	82.5	—	2025	—	llm	Open weights	253000000000	2023	—	42	0.72	$0.60	$1.80
#23	GLM 4.7 Zhipu AI	85.6	—	—	—	25.1	—	—	85.6	—	2025	—	llm	Open weights	—	—	203K	98	0.83	$0.40	$1.75
#24	Grok 4.1 Fast xAI	85.4	—	—	—	17.6	—	—	85.4	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#25	Doubao Seed Code ByteDance	85.4	—	—	—	13.3	—	—	85.4	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
Index 50.5 = (65.3 + 44.9 + 33.6 + 58.2 / 4) — equal-weighted mean of 4 components. General25% 65.3 SimpleQA— AA-LCR65.3 LongBench-v2— IFBench— Reasoning25% 44.9 GPQA Diamond76.4 Humanity’s Last Exam13.3 FrontierMath— ARC-AGI-2— Coding25% 33.6 SWE-bench Verified— Terminal-Bench26.5 Aider Polyglot— SciCode40.7 Tool use & agents25% 58.2 TAU-bench Retail— τ²-bench58.2 BFCL— BrowseComp— Full breakdown for Doubao Seed Code

Ranked on General. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.