298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Math idx ↓	MATH-500	FrontierMath	HMMT 2025	GSM8K	MGSM	AIME 2024	AIME 2025	MATH	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	GPT-5.2 OpenAI	100	—	—	—	—	—	—	100	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
Index 69.4 = (72.7 + 60.2 + 59.7 + 84.8 / 4) — equal-weighted mean of 4 components. General25% 72.7 SimpleQA— AA-LCR72.7 LongBench-v2— IFBench— Reasoning25% 60.2 GPQA Diamond92.4 Humanity’s Last Exam35.4 FrontierMath— ARC-AGI-252.9 Coding25% 59.7 SWE-bench Verified80 Terminal-Bench47 Aider Polyglot— SciCode52.1 Tool use & agents25% 84.8 TAU-bench Retail— τ²-bench84.8 BFCL— BrowseComp— Full breakdown for GPT-5.2
#2	GPT-5 Codex OpenAI	98.7	—	—	—	—	—	—	98.7	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#3	Gemini 3 Flash Google	97	—	—	—	—	—	—	97	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#4	DeepSeek V3.2 Speciale DeepSeek	96.7	—	—	—	—	—	—	96.7	—	2025	—	llm	Open weights	—	—	164K	—	—	$0.29	$0.43
#5	MiMo-V2-Flash Xiaomi	96.3	—	—	—	—	—	—	96.3	—	2025	—	llm	Open weights	—	—	262K	145	1.34	$0.10	$0.30
#6	Claude Haiku 4.5 Anthropic	96.3	—	—	—	—	—	—	96.3	—	2025	—	llm	API only	—	2025	200K	100	0.30	$1.00	$5.00
#7	Gemini 3 Pro Google	95.7	—	—	—	—	—	—	95.7	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#8	GPT-5.1-Codex OpenAI	95.7	—	—	—	—	—	—	95.7	—	2025	—	multimodal	API only	—	—	400K	188	4.16	$1.25	$10.00
#9	Grok 4 xAI	95.4	99	—	—	—	—	—	91.7	—	2025	—	llm	API only	—	2024	256K	100	0.70	$3.00	$15.00
#10	GLM 4.7 Zhipu AI	95	—	—	—	—	—	—	95	—	2025	—	llm	Open weights	—	—	203K	98	0.83	$0.40	$1.75
Index 63.5 = (64.0 + 55.5 + 38.5 + 95.9 / 4) — equal-weighted mean of 4 components. General25% 64 SimpleQA— AA-LCR64 LongBench-v2— IFBench— Reasoning25% 55.5 GPQA Diamond85.9 Humanity’s Last Exam25.1 FrontierMath— ARC-AGI-2— Coding25% 38.5 SWE-bench Verified— Terminal-Bench31.8 Aider Polyglot— SciCode45.1 Tool use & agents25% 95.9 TAU-bench Retail— τ²-bench95.9 BFCL— BrowseComp— Full breakdown for GLM 4.7
#11	o4-mini OpenAI	95	98.9	—	—	—	—	93.4	92.7	—	2025	—	multimodal	API only	—	2024	200K	115	5.20	$1.10	$4.40
#12	Kimi K2 Thinking Moonshot AI	94.7	—	—	—	—	—	—	94.7	—	2025	—	llm	Open weights	1T (32B active)	—	262K	100	1.00	$0.60	$2.50
#13	KAT-Coder-Pro V1 Kuaishou	94.7	—	—	—	—	—	—	94.7	—	2025	—	llm	—	—	—	—	108	2.19	$0.30	$1.20
#14	Qwen3 235B A22B 2507 Alibaba	94.7	98.4	—	—	—	—	—	91	—	2025	—	llm	—	—	—	—	59	1.21	$0.40	$2.20
#15	Nova 2 Lite Amazon	94.3	—	—	—	—	—	—	94.3	—	2025	—	multimodal	API only	—	—	1M	229	0.89	$0.30	$2.50
#16	GPT-5.1 OpenAI	94	—	—	—	—	—	—	94	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
#17	GLM-4.6 Zhipu AI	93.9	—	—	—	—	—	—	93.9	—	2025	—	llm	Open weights	357B (MoE)	2025	203K	85	0.70	$0.43	$1.74
#18	gpt-oss-120b OpenAI	93.4	—	—	—	—	—	—	93.4	—	2025	—	llm	Open weights	117B (5.1B active)	2024	131K	500	0.50	$0.04	$0.18
#19	Grok 4 Fast xAI	92.7	—	—	93.3	—	—	—	92	—	2025	—	llm	API only	—	—	2M	90	—	$0.20	$0.50
#20	Gemini 2.5 Pro Google	92.2	96.7	—	—	—	—	92	88	92	2025	—	multimodal	API only	—	2025	1M	85	0.70	$1.25	$10.00
#21	DeepSeek-V3.2 DeepSeek	92	—	—	—	—	—	—	92	—	2025	—	llm	Open weights	671B (37B active)	—	131K	—	—	$0.25	$0.38
#22	Grok 3 mini Reasoning xAI	92	99.2	—	—	—	—	—	84.7	—	2025	—	llm	—	—	—	—	33	0.52	$0.30	$0.50
#23	GPT-5.1-Codex-Mini OpenAI	91.7	—	—	—	—	—	—	91.7	—	2025	—	multimodal	API only	—	—	400K	175	9.50	$0.25	$2.00
#24	Claude Opus 4.5 Anthropic	91.3	—	—	—	—	—	—	91.3	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#25	Grok-3 xAI	91.2	87	—	—	—	—	93.3	93.3	—	2025	—	multimodal	API only	—	2024	128K	100	0.70	$3.00	$15.00

Ranked on Math. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.