298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Agents idx ↓	τ²-bench	BFCL	τ²-bench Airline	τ²-bench Retail	BrowseComp	TAU-bench Airline	TAU-bench Retail	Released	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	JT-35B-FlashNew China Mobile	99.1	99.1	—	—	—	—	—	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#2	GLM 4.7 Flash Zhipu AI	98.8	98.8	—	—	—	—	—	—	2026	—	llm	Open weights	—	—	203K	113	1.00	$0.06	$0.40
#3	GLM 5 Turbo Zhipu AI	98.5	98.5	—	—	—	—	—	—	2026	—	llm	API only	—	—	203K	—	—	$1.20	$4.00
#4	GLM 5V Turbo Zhipu AI	98.5	98.5	—	—	—	—	—	—	2026	—	multimodal	API only	—	—	203K	—	—	$1.20	$4.00
#5	GLM-5 Zhipu AI	98.2	98.2	—	—	—	—	—	—	2026	—	llm	Open weights	744B (44B active)	—	203K	67	0.77	$0.60	$1.92
#6	Grok 4.3New xAI	97.7	97.7	—	—	—	—	—	—	2026	—	llm	API only	—	—	1M	88	0.52	$1.25	$2.50
#7	Qwen3.6 Plus Alibaba	97.7	97.7	—	—	—	—	—	—	2026	—	multimodal	API only	—	—	1M	52	1.73	$0.33	$1.95
#8	GLM 5.1 Zhipu AI	97.7	97.7	—	—	—	—	—	—	2026	—	llm	Open weights	—	—	203K	53	0.78	$0.98	$3.08
#9	Grok 4.20 0309 xAI	96.5	96.5	—	—	—	—	—	—	2026	—	llm	—	—	—	—	97	0.62	$2.00	$6.00
#10	DeepSeek-V4-Pro DeepSeek	96.2	96.2	—	—	—	—	—	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#11	Kimi K2.6 Moonshot AI	95.9	95.9	—	—	—	—	—	—	2026	—	llm	Open weights	1T (32B active)	—	262K	57	1.20	$0.68	$3.42
#12	Qwen3.6 Max Alibaba	95.9	95.9	—	—	—	—	—	—	2026	—	llm	API only	—	—	262K	36	2.79	$1.04	$6.24
#13	Kimi K2.5 Moonshot AI	95.9	95.9	—	—	—	—	—	—	2026	—	multimodal	Open weights	1T (32B active)	—	262K	35	1.33	$0.40	$1.90
#14	GLM 4.7 Zhipu AI	95.9	95.9	—	—	—	—	—	—	2025	—	llm	Open weights	—	—	203K	98	0.83	$0.40	$1.75
#15	Gemini 3.1 Pro Google	95.6	95.6	—	—	—	—	—	—	2026	—	multimodal	API only	—	—	1M	142	26.02	$2.00	$12.00
#16	Gemini 3.5 FlashNew Google	95.6	95.6	—	—	—	—	—	—	2026	—	multimodal	API only	—	2025	1M	221	9.75	$1.50	$9.00
#17	Qwen3.5 397B A17B Alibaba	95.6	95.6	—	—	—	—	—	—	2026	—	multimodal	Open weights	—	—	262K	53	1.82	$0.39	$2.34
#18	DeepSeek-V4-Flash DeepSeek	95.6	95.6	—	—	—	—	—	—	2026	—	llm	Open weights	284B (13B active)	—	1M	109	0.76	$0.10	$0.20
#19	MiniMax M2.5 MiniMax	95.3	95.3	—	—	—	—	—	—	2026	—	llm	Open weights	—	—	205K	87	1.16	$0.15	$1.15
#20	Qwen3.6 35B A3B Alibaba	95.3	95.3	—	—	—	—	—	—	2026	—	multimodal	Open weights	—	—	262K	169	1.47	$0.14	$1.00
#21	MiMo-V2-Pro Xiaomi	95	95	—	—	—	—	—	—	2026	—	llm	API only	—	—	1M	60	2.01	$1.00	$3.00
#22	MiMo-V2-Flash Xiaomi	95	95	—	—	—	—	—	—	2025	—	llm	Open weights	—	—	262K	145	1.34	$0.10	$0.30
Index 61.9 = (64.3 + 52.9 + 35.3 + 95.0 / 4) — equal-weighted mean of 4 components. General25% 64.3 SimpleQA— AA-LCR64.3 LongBench-v2— IFBench— Reasoning25% 52.9 GPQA Diamond84.6 Humanity’s Last Exam21.1 FrontierMath— ARC-AGI-2— Coding25% 35.3 SWE-bench Verified— Terminal-Bench31.1 Aider Polyglot— SciCode39.4 Tool use & agents25% 95 TAU-bench Retail— τ²-bench95 BFCL— BrowseComp— Full breakdown for MiMo-V2-Flash
#23	Qwen3.7 MaxNew Alibaba	94.7	94.7	—	—	—	—	—	—	2026	—	llm	API only	—	—	1M	203	1.59	$1.25	$3.75
Index 69.7 = (69.0 + 65.2 + 49.8 + 94.7 / 4) — equal-weighted mean of 4 components. General25% 69 SimpleQA— AA-LCR69 LongBench-v2— IFBench— Reasoning25% 65.2 GPQA Diamond92.3 Humanity’s Last Exam38.1 FrontierMath— ARC-AGI-2— Coding25% 49.8 SWE-bench Verified— Terminal-Bench50.8 Aider Polyglot— SciCode48.8 Tool use & agents25% 94.7 TAU-bench Retail— τ²-bench94.7 BFCL— BrowseComp— Full breakdown for Qwen3.7 Max
#24	Claude Opus 4.8New Anthropic	94.4	94.4	—	—	—	—	—	—	2026	—	llm	API only	—	—	1M	66	6.54	$5.00	$25.00
#25	Step 3.5 Flash StepFun	94.4	94.4	—	—	—	—	—	—	2026	—	llm	Open weights	—	—	262K	194	0.85	$0.09	$0.30

Ranked on Agents. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.