298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#201	Qwen3 30B A3B 2507 Alibaba	36.7	59	40.3	19.3	28.1	76.9	—	59	70.7	—	—	—	33.3	5.3	70.7	—	—	—	—	—	—	—	56.3	97.6	—	—	—	—	—	—	28.1	—	—	—	—	—	—	—	—	—	—	—	—	9.8	80.5	—	—	—	—	—	—	59	—	2025	—	llm	—	—	—	—	151	1.18	$0.30	$1.90
#202	Qwen3 30B A3B 2507 Instruct Alibaba	21.9	22.7	36.4	18.3	10.2	81.9	—	22.7	65.9	—	—	—	30.4	6.1	51.5	—	—	—	—	—	—	—	66.3	97.5	—	—	—	—	—	—	10.2	—	—	—	—	—	—	—	—	—	—	—	—	6.8	77.7	—	—	—	—	—	—	22.7	—	2025	—	llm	—	—	—	—	122	1.25	$0.20	$0.40
#203	GLM-4.5 Zhipu AI	47.6	48.3	46.8	45.5	49.7	87.6	—	48.3	79.1	—	—	—	34.8	37.5	72.9	64.2	—	—	—	—	—	—	73.7	98.2	91	—	—	—	—	—	43	79.7	60.4	—	26.4	—	—	—	—	—	—	—	—	14.4	84.6	—	—	—	—	—	—	48.3	—	2025	—	llm	Open weights	355B (32B active)	2024	131K	85	0.70	$0.60	$2.20
#204	Qwen3 235B A22B 2507 Alibaba	48.8	67	47	28	53.2	94.7	—	67	79	—	—	—	42.4	13.6	78.8	—	—	—	—	—	—	—	91	98.4	—	—	—	—	—	—	53.2	—	—	—	—	—	—	—	—	—	—	—	—	15	84.3	—	—	—	—	—	—	67	—	2025	—	llm	—	—	—	—	59	1.21	$0.40	$2.20
#205	GLM 4.5 Air Zhipu AI	43.6	43.7	42.8	39.4	48.6	89.4	—	43.7	75	—	—	—	30.6	30	70.7	57.6	—	—	—	—	—	—	80.7	98.1	89.4	—	—	—	—	—	46.5	77.9	60.8	—	21.3	—	—	—	—	—	—	—	—	10.6	81.4	—	—	—	—	—	—	43.7	—	2025	—	llm	Open weights	—	2024	131K	63	1.68	$0.13	$0.85
#206	Llama Nemotron Super 49B v1.5 NVIDIA	30.7	34	40.8	20	28.1	87.5	—	34	74.8	—	—	—	34.8	5.3	73.7	—	—	—	—	—	—	—	76.7	98.3	—	—	—	—	—	—	28.1	—	—	—	—	—	—	—	—	—	—	—	—	6.8	81.4	—	—	—	—	—	—	34	—	2025	—	llm	—	—	—	—	51	0.29	$0.10	$0.40
#207	Qwen3-235B-A22B-Instruct-2507 Alibaba	39.1	42.8	44.1	36.2	33.3	84.2	—	31.2	77.5	—	—	—	36	15.2	52.4	—	57.3	—	—	—	87.9	—	70.3	98	—	—	—	—	—	—	33.3	—	—	—	—	44	71.3	—	—	—	—	—	—	10.6	83	—	88.7	54.3	77.5	—	—	31.2	—	2025	—	llm	Open weights	235000000000	—	131K	63	1.18	$0.15	$0.80
#208	Qwen3 Coder 480B A35B Instruct Alibaba	36.6	42.3	33.1	27.4	43.6	66.8	—	42.3	61.8	—	—	—	35.9	18.9	58.5	—	—	—	—	—	—	—	39.3	94.2	—	—	—	—	—	—	43.6	—	—	—	—	—	—	—	—	—	—	—	—	4.4	78.8	—	—	—	—	—	—	42.3	—	2025	—	llm	—	—	—	—	69	1.68	$0.30	$1.80
#209	Gemini 2.5 Flash Lite Google	26.3	31	34.8	20.5	19	73.4	72.9	51.3	64.6	—	—	—	19.3	4.5	33.7	31.6	26.7	—	—	—	—	—	49.8	96.9	—	—	—	—	—	—	19	—	—	—	—	—	—	72.9	—	—	—	—	—	5.1	75.9	—	—	10.7	—	—	—	51.3	—	2025	—	multimodal	API only	—	2025	1M	6	0.44	$0.10	$0.40
#210	EXAONE 4.0 32B LG AI Research	23.2	14	42.2	19.1	17.3	88.9	—	14	73.9	—	—	—	34.4	3.8	74.7	—	—	—	—	—	—	—	80	97.7	—	—	—	—	—	—	17.3	—	—	—	—	—	—	—	—	—	—	—	—	10.5	81.8	—	—	—	—	—	—	14	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#211	Exaone 4.0 1.2B LG AI Research	13.5	0	28.7	4.7	20.5	50.3	—	0	51.5	—	—	—	9.3	0	51.6	—	—	—	—	—	—	—	50.3	—	—	—	—	—	—	—	20.5	—	—	—	—	—	—	—	—	—	—	—	—	5.8	58.8	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#212	Kimi K2 Moonshot AI	49.4	51	41.8	43.8	61.1	74.6	—	51	76.6	—	—	—	34.5	15.9	55.6	65.8	59.1	—	—	—	—	—	57	97.1	69.6	—	—	—	—	—	61.1	—	—	—	—	—	—	—	—	—	—	—	—	7	82.4	89.5	—	—	—	—	—	51	—	2025	—	llm	Open weights	1T (32B active)	2024	131K	26	1.51	$0.57	$2.30
#213	Devstral Medium Mistral AI	23.6	28.7	26.5	19.3	19.9	37.7	—	28.7	49.2	—	—	—	29.4	9.1	33.7	—	—	—	—	—	—	—	4.7	70.7	—	—	—	—	—	—	19.9	—	—	—	—	—	—	—	—	—	—	—	—	3.8	70.8	—	—	—	—	—	—	28.7	—	2025	—	llm	API only	—	2025	131K	72	0.49	$0.40	$2.00
Index 23.6 = (28.7 + 26.5 + 19.3 + 19.9 / 4) — equal-weighted mean of 4 components. General25% 28.7 SimpleQA— AA-LCR28.7 LongBench-v2— IFBench— Reasoning25% 26.5 GPQA Diamond49.2 Humanity’s Last Exam3.8 FrontierMath— ARC-AGI-2— Coding25% 19.3 SWE-bench Verified— Terminal-Bench9.1 Aider Polyglot— SciCode29.4 Tool use & agents25% 19.9 TAU-bench Retail— τ²-bench19.9 BFCL— BrowseComp— Full breakdown for Devstral Medium
#214	LFM2 1.2B Liquid AI	7.1	0	14.3	1.3	12.6	3.3	—	0	22.8	—	—	—	2.5	0	2	—	—	—	—	—	—	—	3.3	—	—	—	—	—	—	—	12.6	—	—	—	—	—	—	—	—	—	—	—	—	5.7	25.7	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#215	Grok 4 xAI	61.3	68	47.8	54.4	74.9	95.4	—	68	87.5	—	15.9	—	45.7	37.9	79	—	79.6	—	—	—	—	—	91.7	99	—	—	—	—	—	—	74.9	—	—	—	—	—	—	—	—	—	—	—	—	40	86.6	—	—	—	—	—	—	68	—	2025	—	llm	API only	—	2024	256K	100	0.70	$3.00	$15.00
#216	Jamba 1.7 Mini AI21 Labs	12.1	12.7	18.4	4.7	12.6	13.1	—	12.7	32.2	—	—	—	9.3	0	6.1	—	—	—	—	—	—	—	0.3	25.8	—	—	—	—	—	—	12.6	—	—	—	—	—	—	—	—	—	—	—	—	4.5	38.8	—	—	—	—	—	—	12.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#217	ERNIE 4.5 300B A47B Baidu	15.9	2.3	42.3	18.8	0	67.2	—	2.3	81.1	—	—	—	31.5	6.1	46.7	—	—	—	—	—	—	—	41.3	93.1	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	3.5	77.6	—	—	—	—	—	—	2.3	—	2025	—	llm	Open weights	—	2025	131K	24	1.53	$0.28	$1.10
#218	Gemma 3n E2B Instruct Google	4.1	0	13.5	3	0	39.7	—	0	22.9	—	—	—	5.2	0.8	9.5	—	—	—	—	—	—	—	10.3	69.1	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	4	37.8	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#219	Mistral Small 3.2 Mistral AI	22.7	17.3	27.4	16.6	29.5	57.7	—	17.3	50.5	—	—	—	26.4	6.8	27.5	—	—	—	—	—	—	—	27	88.3	—	—	—	—	—	—	29.5	—	—	—	—	—	—	—	—	—	—	—	—	4.3	68.1	—	—	—	—	—	—	17.3	—	2025	—	llm	—	—	—	—	100	0.40	$0.10	$0.30
#220	MiniMax M1 80k MiniMax	36.9	54.3	39	20.2	34.2	79.5	—	54.3	69.7	—	—	—	37.4	3	71.1	—	—	—	—	—	—	—	61	98	—	—	—	—	—	—	34.2	—	—	—	—	—	—	—	—	—	—	—	—	8.2	81.6	—	—	—	—	—	—	54.3	—	2025	—	llm	—	—	—	—	—	—	$0.60	$2.20
#221	MiniMax M1 40k MiniMax	35.3	51.7	37.9	20	31.6	55.5	—	51.7	68.2	—	—	—	37.8	2.3	65.7	—	—	—	—	—	—	—	13.7	97.2	—	—	—	—	—	—	31.6	—	—	—	—	—	—	—	—	—	—	—	—	7.5	80.8	—	—	—	—	—	—	51.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#222	Magistral Medium 1 Mistral AI	20.3	0	38.7	19.4	23.1	66	—	0	67.9	—	—	—	29.7	9.1	52.7	—	—	—	—	—	—	—	40.3	91.7	—	—	—	—	—	—	23.1	—	—	—	—	—	—	—	—	—	—	—	—	9.5	75.3	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#223	Magistral Small 1 Mistral AI	19.2	0	35.7	14.3	26.6	68.8	—	0	64.1	—	—	—	24.1	4.5	51.4	—	—	—	—	—	—	—	41.3	96.3	—	—	—	—	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	7.2	74.6	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#224	DeepSeek R1 0528 Qwen3 8B DeepSeek	14.4	13	33.4	11	0	78.5	—	13	61.2	—	—	—	20.4	1.5	51.3	—	—	—	—	—	—	—	63.7	93.2	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	5.6	73.9	—	—	—	—	—	—	13	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#225	DeepSeek-R1-0528 DeepSeek	46.6	73.5	49.4	40.6	22.7	89.2	—	54.7	81	—	—	—	40.3	5.7	73.3	44.6	71.6	—	—	—	—	—	87.5	98.3	91.4	—	—	—	79.4	—	36.5	—	—	—	8.9	—	—	—	—	—	—	—	—	17.7	85	—	—	92.3	—	—	—	54.7	—	2025	—	llm	Open weights	671000000000	—	131K	45	0.30	$0.55	$2.19

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.