298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#126	Grok 4.1 Fast xAI	61.8	68	51.5	34.2	93.3	89.3	—	68	85.3	—	—	—	44.2	24.2	82.2	—	—	—	—	—	—	—	89.3	—	—	—	—	—	—	—	93.3	—	—	—	—	—	—	—	—	—	—	—	—	17.6	85.4	—	—	—	—	—	—	68	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#127	Gemini 3 Pro Google	67.3	70.7	53.5	58	87.1	95.7	—	70.7	91.9	—	31.1	—	56.1	41.7	91.7	76.2	—	—	—	—	—	—	95.7	—	—	—	—	—	—	—	87.1	—	—	—	—	—	—	—	—	—	—	—	—	37.5	89.8	—	—	—	—	—	—	70.7	—	2025	—	multimodal	API only	—	—	1M	141	27.49	$2.00	$12.00
#128	GPT-5.1-Codex OpenAI	60.6	67.3	54.7	37.5	83	95.7	—	67.3	86	—	—	—	40.2	34.8	84.9	—	—	—	—	—	—	—	95.7	—	—	—	—	—	—	—	83	—	—	—	—	—	—	—	—	—	—	—	—	23.4	86	—	—	—	—	—	—	67.3	—	2025	—	multimodal	API only	—	—	400K	188	4.16	$1.25	$10.00
#129	GPT-5.1-Codex-Mini OpenAI	53.2	62.7	49.1	38	62.9	91.7	—	62.7	81.3	—	—	—	42.6	33.3	83.6	—	—	—	—	—	—	—	91.7	—	—	—	—	—	—	—	62.9	—	—	—	—	—	—	—	—	—	—	—	—	16.9	82	—	—	—	—	—	—	62.7	—	2025	—	multimodal	API only	—	—	400K	175	9.50	$0.25	$2.00
#130	ERNIE 5.0 Thinking Baidu	41.8	6.7	45.2	31.3	83.9	85	—	6.7	77.7	—	—	—	37.5	25	81.2	—	—	—	—	—	—	—	85	—	—	—	—	—	—	—	83.9	—	—	—	—	—	—	—	—	—	—	—	—	12.7	83	—	—	—	—	—	—	6.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#131	GPT-5.1 OpenAI	64.7	75	57.3	44.4	81.9	94	—	75	88.1	—	—	—	43.3	45.5	86.8	—	—	—	—	—	—	—	94	—	—	—	—	—	—	—	81.9	—	—	—	—	—	—	—	—	—	—	—	—	26.5	87	—	—	—	—	—	—	75	—	2025	—	llm	API only	—	—	400K	115	0.77	$1.25	$10.00
#132	KAT-Coder-Pro V1 Kuaishou	60.1	74	54.9	22.9	88.6	94.7	—	74	76.4	—	—	—	36.6	9.1	74.7	—	—	—	—	—	—	—	94.7	—	—	—	—	—	—	—	88.6	—	—	—	—	—	—	—	—	—	—	—	—	33.4	81.3	—	—	—	—	—	—	74	—	2025	—	llm	—	—	—	—	108	2.19	$0.30	$1.20
#133	Doubao Seed Code ByteDance	50.5	65.3	44.9	33.6	58.2	79.3	—	65.3	76.4	—	—	—	40.7	26.5	76.6	—	—	—	—	—	—	—	79.3	—	—	—	—	—	—	—	58.2	—	—	—	—	—	—	—	—	—	—	—	—	13.3	85.4	—	—	—	—	—	—	65.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#134	Kimi K2 Thinking Moonshot AI	65.3	66.3	53.4	48.3	93	94.7	—	66.3	84.5	—	—	—	42.4	31.1	85.3	71.3	—	—	—	—	—	—	94.7	—	—	—	—	—	—	—	93	—	—	—	—	—	—	—	—	—	—	—	—	22.3	84.8	—	—	—	—	—	—	66.3	—	2025	—	llm	Open weights	1T (32B active)	—	262K	100	1.00	$0.60	$2.50
Index 65.3 = (66.3 + 53.4 + 48.3 + 93.0 / 4) — equal-weighted mean of 4 components. General25% 66.3 SimpleQA— AA-LCR66.3 LongBench-v2— IFBench— Reasoning25% 53.4 GPQA Diamond84.5 Humanity’s Last Exam22.3 FrontierMath— ARC-AGI-2— Coding25% 48.3 SWE-bench Verified71.3 Terminal-Bench31.1 Aider Polyglot— SciCode42.4 Tool use & agents25% 93 TAU-bench Retail— τ²-bench93 BFCL— BrowseComp— Full breakdown for Kimi K2 Thinking
#135	Kimi Linear 48B A3B Instruct Moonshot AI	15.9	25.7	22	15.7	0	36.3	—	25.7	41.2	—	—	—	19.9	11.4	37.8	—	—	—	—	—	—	—	36.3	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	2.7	58.5	—	—	—	—	—	—	25.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#136	NVIDIA Nemotron Nano 12B v2 VL NVIDIA	27	40	31.3	15.4	21.3	75	—	40	57.2	—	—	—	26.2	4.5	69.4	—	—	—	—	—	—	—	75	—	—	—	—	—	—	—	21.3	—	—	—	—	—	—	—	—	—	—	—	—	5.3	75.9	—	—	—	—	—	—	40	—	2025	—	llm	—	—	—	—	244	0.74	$0.20	$0.60
#137	Granite 4.0 1B IBM	12	4	16.6	4.4	22.8	6.3	—	4	28.1	—	—	—	8.7	0	4.7	—	—	—	—	—	—	—	6.3	—	—	—	—	—	—	—	22.8	—	—	—	—	—	—	—	—	—	—	—	—	5.1	32.5	—	—	—	—	—	—	4	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#138	Granite 4.0 H 1B IBM	11.4	6.3	15.7	4.1	19.6	6.3	—	6.3	26.3	—	—	—	8.2	0	11.5	—	—	—	—	—	—	—	6.3	—	—	—	—	—	—	—	19.6	—	—	—	—	—	—	—	—	—	—	—	—	5	27.7	—	—	—	—	—	—	6.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#139	Granite 4.0 H 350M IBM	7.9	0	16.1	0.9	14.6	1.3	—	0	25.7	—	—	—	1.7	0	1.9	—	—	—	—	—	—	—	1.3	—	—	—	—	—	—	—	14.6	—	—	—	—	—	—	—	—	—	—	—	—	6.4	12.7	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#140	Granite 4.0 350M IBM	7.4	0	15.9	0.5	13.2	0	—	0	26.1	—	—	—	0.9	0	2.4	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	13.2	—	—	—	—	—	—	—	—	—	—	—	—	5.7	12.4	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#141	MiniMax-M2 MiniMax	60.9	61	45.1	50.6	86.8	78.3	—	61	77.7	—	—	—	36.1	46.3	82.6	69.4	—	—	—	—	—	—	78.3	—	—	—	—	—	—	—	86.8	—	—	—	—	—	—	—	—	—	—	—	—	12.5	82	—	—	—	—	—	—	61	—	2025	—	llm	Open weights	230B (10B active)	—	205K	91	1.19	$0.26	$1.00
#142	Qwen3 VL 32B Instruct Alibaba	29.1	31.3	36.7	19.2	29.2	68.3	—	31.3	67.1	—	—	—	30.1	8.3	51.4	—	—	—	—	—	—	—	68.3	—	—	—	—	—	—	—	29.2	—	—	—	—	—	—	—	—	—	—	—	—	6.3	79.1	—	—	—	—	—	—	31.3	—	2025	—	multimodal	Open weights	—	—	262K	76	1.16	$0.10	$0.42
#143	Qwen3 VL 32B Alibaba	40.1	55.3	41.4	18.1	45.6	84.7	—	55.3	73.3	—	—	—	28.5	7.6	73.8	—	—	—	—	—	—	—	84.7	—	—	—	—	—	—	—	45.6	—	—	—	—	—	—	—	—	—	—	—	—	9.6	81.8	—	—	—	—	—	—	55.3	—	2025	—	llm	—	—	—	—	93	1.26	$0.70	$8.40
#144	Granite 4.0 Micro IBM	10.7	4	19.4	6.7	12.6	6	—	4	33.6	—	—	—	11.9	1.5	18	—	—	—	—	—	—	—	6	—	—	—	—	—	—	—	12.6	—	—	—	—	—	—	—	—	—	—	—	—	5.1	44.7	—	—	—	—	—	—	4	—	2025	—	llm	Open weights	—	—	131K	—	—	$0.02	$0.11
#145	Phi 4 Mini Instruct Microsoft	11.5	13.7	18.7	5.4	8.2	38.2	—	13.7	33.1	—	—	—	10.8	0	12.6	—	—	—	—	—	—	—	6.7	69.6	—	—	—	—	—	—	8.2	—	—	—	—	—	—	—	—	—	—	—	—	4.2	46.5	—	—	—	—	—	—	13.7	—	2025	—	llm	Open weights	—	—	131K	—	—	$0.08	$0.35
#146	Claude Haiku 4.5 Anthropic	54.7	70.3	41.4	52.5	54.7	96.3	—	70.3	73	—	—	—	43.3	41	61.5	73.3	—	—	—	—	—	39.5	96.3	—	—	—	—	—	—	—	54.7	—	—	—	—	63.6	83.2	—	—	—	—	—	—	9.7	80	—	—	—	—	—	—	70.3	—	2025	—	llm	API only	—	2025	200K	100	0.30	$1.00	$5.00
#147	Qwen3 VL 8B Alibaba	24.3	31	30.6	12.9	22.5	30.7	—	31	57.9	—	—	—	21.9	3.8	35.3	—	—	—	—	—	—	—	30.7	—	—	—	—	—	—	—	22.5	—	—	—	—	—	—	—	—	—	—	—	—	3.3	74.9	—	—	—	—	—	—	31	—	2025	—	llm	—	—	—	—	120	1.15	$0.20	$2.10
#148	Qwen3 VL 8B Instruct Alibaba	19.3	15.3	22.8	9.9	29.2	27.3	—	15.3	42.7	—	—	—	17.4	2.3	33.2	—	—	—	—	—	—	—	27.3	—	—	—	—	—	—	—	29.2	—	—	—	—	—	—	—	—	—	—	—	—	2.9	68.6	—	—	—	—	—	—	15.3	—	2025	—	multimodal	Open weights	—	—	256K	145	1.05	$0.08	$0.50
#149	Qwen3 VL 4B Alibaba	18.3	21.3	26.9	9.3	15.5	25.7	—	21.3	49.4	—	—	—	17.1	1.5	32	—	—	—	—	—	—	—	25.7	—	—	—	—	—	—	—	15.5	—	—	—	—	—	—	—	—	—	—	—	—	4.4	70	—	—	—	—	—	—	21.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#150	Qwen3 VL 4B Instruct Alibaba	15.9	13	20.4	6.9	23.4	37	—	13	37.1	—	—	—	13.7	0	29	—	—	—	—	—	—	—	37	—	—	—	—	—	—	—	23.4	—	—	—	—	—	—	—	—	—	—	—	—	3.7	63.4	—	—	—	—	—	—	13	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.