298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 29, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#1	Claude Opus 4.8New Anthropic	71.7	67.7	68.9	55.9	94.4	—	—	67.7	92	—	—	—	53.5	58.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.4	—	—	—	—	—	—	—	—	—	—	—	—	45.7	—	—	—	—	—	—	—	67.7	—	2026	—	llm	API only	—	—	1M	66	6.54	$5.00	$25.00
#2	MiniCPM5-1BNew OpenBMB	25.9	4.7	15.8	0.7	82.5	—	—	4.7	26.9	—	—	—	1.4	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	82.5	—	—	—	—	—	—	—	—	—	—	—	—	4.6	—	—	—	—	—	—	—	4.7	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#3	Qwen3.7 MaxNew Alibaba	69.7	69	65.2	49.8	94.7	—	—	69	92.3	—	—	—	48.8	50.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.7	—	—	—	—	—	—	—	—	—	—	—	—	38.1	—	—	—	—	—	—	—	69	—	2026	—	llm	API only	—	—	1M	203	1.59	$1.25	$3.75
#4	Gemini 3.5 FlashNew Google	70.7	71	66.6	49.7	95.6	—	—	71	92.2	—	—	—	53.1	46.2	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	95.6	—	—	—	—	—	—	—	—	—	—	—	—	41	—	—	—	—	—	—	—	71	—	2026	—	multimodal	API only	—	2025	1M	221	9.75	$1.50	$9.00
#5	JT-35B-FlashNew China Mobile	57	55.3	44.5	29	99.1	—	—	55.3	82.9	—	—	—	29.1	28.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	99.1	—	—	—	—	—	—	—	—	—	—	—	—	6.1	—	—	—	—	—	—	—	55.3	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#6	MiniCPM-V 4.6 1.3BNew OpenBMB	28.2	6.3	17.7	1.1	87.7	—	—	6.3	30.5	—	—	—	2.1	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	87.7	—	—	—	—	—	—	—	—	—	—	—	—	4.9	—	—	—	—	—	—	—	6.3	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#7	Ring-2.6-1TNew InclusionAI	61.1	64.3	52	35.6	92.4	—	—	64.3	85.7	—	—	—	42.4	28.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	92.4	—	—	—	—	—	—	—	—	—	—	—	—	18.3	—	—	—	—	—	—	—	64.3	—	2026	—	llm	API only	—	—	262K	120	1.88	$0.08	$0.63
#8	Gemini 3.1 Flash LiteNew Google	44.7	65.3	49.2	33.1	31.3	—	—	65.3	82.2	—	—	—	41.9	24.2	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	31.3	—	—	—	—	—	—	—	—	—	—	—	—	16.2	—	—	—	—	—	—	—	65.3	—	2026	—	multimodal	API only	—	—	1M	342	5.35	$0.25	$1.50
Index 44.7 = (65.3 + 49.2 + 33.1 + 31.3 / 4) — equal-weighted mean of 4 components. General25% 65.3 SimpleQA— AA-LCR65.3 LongBench-v2— IFBench— Reasoning25% 49.2 GPQA Diamond82.2 Humanity’s Last Exam16.2 FrontierMath— ARC-AGI-2— Coding25% 33.1 SWE-bench Verified— Terminal-Bench24.2 Aider Polyglot— SciCode41.9 Tool use & agents25% 31.3 TAU-bench Retail— τ²-bench31.3 BFCL— BrowseComp— Full breakdown for Gemini 3.1 Flash Lite
#9	Grok 4.3New xAI	67	65	62.6	42.6	97.7	—	—	65	90.1	—	—	—	47.3	37.9	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	97.7	—	—	—	—	—	—	—	—	—	—	—	—	35	—	—	—	—	—	—	—	65	—	2026	—	llm	API only	—	—	1M	88	0.52	$1.25	$2.50
#10	GPT-5.5 InstantNew OpenAI	51	55.7	52.5	46.3	49.4	—	—	55.7	84.6	—	—	—	50.3	42.4	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	49.4	—	—	—	—	—	—	—	—	—	—	—	—	20.3	—	—	—	—	—	—	—	55.7	—	2026	—	llm	—	—	—	—	—	—	$5.00	$30.00
#11	Mistral Medium 3.5 Mistral AI	58.9	61	43.8	36.5	94.2	—	—	61	74.8	—	—	—	39.6	33.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.2	—	—	—	—	—	—	—	—	—	—	—	—	12.8	—	—	—	—	—	—	—	61	—	2026	—	multimodal	API only	—	—	262K	140	0.58	$1.50	$7.50
#12	Granite 4.1 8B IBM	18.6	12	23.5	10.9	27.8	—	—	12	43.3	—	—	—	21.8	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	27.8	—	—	—	—	—	—	—	—	—	—	—	—	3.8	—	—	—	—	—	—	—	12	—	2026	—	llm	Open weights	—	—	131K	133	0.47	$0.05	$0.10
#13	Nemotron 3 Nano Omni 30B A3B Reasoning NVIDIA	31.3	35.7	26.1	18.1	45.3	—	—	35.7	46.9	—	—	—	27.8	8.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	45.3	—	—	—	—	—	—	—	—	—	—	—	—	5.3	—	—	—	—	—	—	—	35.7	—	2026	—	llm	—	—	—	—	301	0.58	$0.10	$0.30
#14	Granite 4.1 30B IBM	25.3	18.7	26.2	14.1	42.1	—	—	18.7	48.1	—	—	—	25.8	2.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	42.1	—	—	—	—	—	—	—	—	—	—	—	—	4.2	—	—	—	—	—	—	—	18.7	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#15	Granite 4.1 3B IBM	11.8	3	17.4	7.1	19.6	—	—	3	31.4	—	—	—	11.9	2.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	19.6	—	—	—	—	—	—	—	—	—	—	—	—	3.4	—	—	—	—	—	—	—	3	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#16	Qwen3.6 Max Alibaba	67.5	69.7	58.9	45.4	95.9	—	—	69.7	88.8	—	—	—	46.9	43.9	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	95.9	—	—	—	—	—	—	—	—	—	—	—	—	28.9	—	—	—	—	—	—	—	69.7	—	2026	—	llm	API only	—	—	262K	36	2.79	$1.04	$6.24
#17	Qwen3.6 27B Alibaba	63.3	68.7	52.9	37.3	94.2	—	—	68.7	84.2	—	—	—	39.8	34.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.2	—	—	—	—	—	—	—	—	—	—	—	—	21.6	—	—	—	—	—	—	—	68.7	—	2026	—	multimodal	Open weights	—	—	262K	64	1.40	$0.29	$3.20
#18	Qwen3.6 35B A3B Alibaba	61.6	63.7	52.2	35.3	95.3	—	—	63.7	84.1	—	—	—	35.8	34.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	95.3	—	—	—	—	—	—	—	—	—	—	—	—	20.2	—	—	—	—	—	—	—	63.7	—	2026	—	multimodal	Open weights	—	—	262K	169	1.47	$0.14	$1.00
#19	DeepSeek-V4-Pro DeepSeek	71.1	66.3	63	58.9	96.2	—	—	66.3	90.1	—	—	—	50	46.2	93.5	80.6	—	—	—	—	—	—	—	—	—	—	—	—	—	—	96.2	—	—	—	—	—	—	—	—	—	—	—	—	35.9	87.5	—	—	—	—	—	—	66.3	—	2026	—	llm	Open weights	1.6T (49B active)	—	1M	30	1.16	$0.44	$0.87
#20	DeepSeek-V4-Flash DeepSeek	65.3	63	60.8	41.8	95.6	—	—	63	89.4	—	—	—	44.9	38.6	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	95.6	—	—	—	—	—	—	—	—	—	—	—	—	32.1	—	—	—	—	—	—	—	63	—	2026	—	llm	Open weights	284B (13B active)	—	1M	109	0.76	$0.10	$0.20
#21	GPT-5.5 OpenAI	73.9	74.3	68.9	58.4	93.9	—	—	74.3	93.5	—	—	—	56.1	60.6	—	—	—	—	—	—	—	58.6	—	—	—	—	—	—	—	—	93.9	—	—	—	—	—	—	—	—	—	—	—	—	44.3	—	—	—	—	—	—	—	74.3	—	2026	—	llm	API only	—	2025	1.1M	67	0.97	$5.00	$30.00
Index 73.9 = (74.3 + 68.9 + 58.4 + 93.9 / 4) — equal-weighted mean of 4 components. General25% 74.3 SimpleQA— AA-LCR74.3 LongBench-v2— IFBench— Reasoning25% 68.9 GPQA Diamond93.5 Humanity’s Last Exam44.3 FrontierMath— ARC-AGI-2— Coding25% 58.4 SWE-bench Verified— Terminal-Bench60.6 Aider Polyglot— SciCode56.1 Tool use & agents25% 93.9 TAU-bench Retail— τ²-bench93.9 BFCL— BrowseComp— Full breakdown for GPT-5.5
#22	Ling-2.6-1T InclusionAI	50.1	34.7	41.7	34.1	89.8	—	—	34.7	75.2	—	—	—	37	31.1	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	89.8	—	—	—	—	—	—	—	—	—	—	—	—	8.2	—	—	—	—	—	—	—	34.7	—	2026	—	llm	API only	—	—	262K	—	—	$0.08	$0.63
#23	MiMo-V2.5-Pro Xiaomi	68.6	73.3	60.2	46.7	94.2	—	—	73.3	86.6	—	—	—	50.2	43.2	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.2	—	—	—	—	—	—	—	—	—	—	—	—	33.8	—	—	—	—	—	—	—	73.3	—	2026	—	llm	Open weights	—	—	1M	58	2.08	$0.44	$0.87
#24	MiMo-V2.5 Xiaomi	62.7	62.7	55.1	42.4	90.6	—	—	62.7	84.9	—	—	—	43.1	41.7	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	90.6	—	—	—	—	—	—	—	—	—	—	—	—	25.2	—	—	—	—	—	—	—	62.7	—	2026	—	multimodal	Open weights	—	—	1M	92	2.67	$0.14	$0.28
#25	Hy3 Tencent	60.3	54.7	56.1	37.7	92.7	—	—	54.7	86.7	—	—	—	41.2	34.1	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	92.7	—	—	—	—	—	—	—	—	—	—	—	—	25.5	—	—	—	—	—	—	—	54.7	—	2026	—	llm	Open weights	—	—	262K	100	2.53	$0.06	$0.21

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.