298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#101	MiMo-V2-Flash Xiaomi	61.9	64.3	52.9	35.3	95	96.3	—	64.3	84.6	—	—	—	39.4	31.1	86.8	—	—	—	—	—	—	—	96.3	—	—	—	—	—	—	—	95	—	—	—	—	—	—	—	—	—	—	—	—	21.1	84.3	—	—	—	—	—	—	64.3	—	2025	—	llm	Open weights	—	—	262K	145	1.34	$0.10	$0.30
#102	Olmo 3.1 32B Think Allen Institute for AI	11.8	0	32.6	14.7	0	77.3	—	0	59.1	—	—	—	29.3	0	69.5	—	—	—	—	—	—	—	77.3	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	6	76.3	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
Index 11.8 = (0.0 + 32.6 + 14.7 + 0.0 / 4) — equal-weighted mean of 4 components. General25% 0 SimpleQA— AA-LCR0 LongBench-v2— IFBench— Reasoning25% 32.6 GPQA Diamond59.1 Humanity’s Last Exam6 FrontierMath— ARC-AGI-2— Coding25% 14.7 SWE-bench Verified— Terminal-Bench0 Aider Polyglot— SciCode29.3 Tool use & agents25% 0 TAU-bench Retail— τ²-bench0 BFCL— BrowseComp— Full breakdown for Olmo 3.1 32B Think
#103	GPT-5.2 OpenAI	69.4	72.7	60.2	59.7	84.8	100	—	72.7	92.4	—	52.9	—	52.1	47	89.4	80	—	—	—	—	—	—	100	—	—	—	—	—	—	—	84.8	—	—	—	—	—	—	—	—	—	—	—	—	35.4	87.4	—	—	—	—	—	—	72.7	—	2025	—	llm	API only	—	—	400K	73	0.69	$1.75	$14.00
#104	Mi:dm K 2.5 Pro Korea Telecom	39	11	40.5	18.1	86.5	78.7	—	11	72.2	—	—	—	33.2	3	65.6	—	—	—	—	—	—	—	78.7	—	—	—	—	—	—	—	86.5	—	—	—	—	—	—	—	—	—	—	—	—	8.8	81.3	—	—	—	—	—	—	11	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#105	Molmo2-8B Allen Institute for AI	7.6	0	23.5	6.7	0	—	—	0	42.5	—	—	—	13.3	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	4.4	—	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#106	Devstral 2 Mistral AI	28.1	30	31.5	26	24.9	36.7	—	30	59.4	—	—	—	33.1	18.9	44.8	—	—	—	—	—	—	—	36.7	—	—	—	—	—	—	—	24.9	—	—	—	—	—	—	—	—	—	—	—	—	3.6	76.2	—	—	—	—	—	—	30	—	2025	—	llm	Open weights	—	—	262K	51	0.64	$0.40	$2.00
#107	Devstral Small 2 Mistral AI	24.6	24	28.3	22.8	23.4	34.3	—	24	53.2	—	—	—	28.8	16.7	34.8	—	—	—	—	—	—	—	34.3	—	—	—	—	—	—	—	23.4	—	—	—	—	—	—	—	—	—	—	—	—	3.4	67.8	—	—	—	—	—	—	24	—	2025	—	llm	—	—	—	—	62	0.75	$0.00	$0.00
#108	GLM 4.6V Zhipu AI	33.7	40.3	40.4	22.4	31.6	85.3	—	40.3	71.9	—	—	—	30.4	14.4	41.1	—	—	—	—	—	—	—	85.3	—	—	—	—	—	—	—	31.6	—	—	—	—	—	—	—	—	—	—	—	—	8.9	79.9	—	—	—	—	—	—	40.3	—	2025	—	multimodal	Open weights	—	—	131K	44	1.31	$0.30	$0.90
#109	K2-V2 MBZUAI Institute of Foundation Models	29.8	33.3	38.9	19.2	27.8	78.3	—	33.3	68.1	—	—	—	28.6	9.8	69.4	—	—	—	—	—	—	—	78.3	—	—	—	—	—	—	—	27.8	—	—	—	—	—	—	—	—	—	—	—	—	9.8	78.6	—	—	—	—	—	—	33.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#110	Motif-2-12.7B-Reasoning Motif Technologies	28.6	13	38.9	16	46.5	80.3	—	13	69.5	—	—	—	28.2	3.8	65.1	—	—	—	—	—	—	—	80.3	—	—	—	—	—	—	—	46.5	—	—	—	—	—	—	—	—	—	—	—	—	8.2	79.6	—	—	—	—	—	—	13	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#111	Nova 2 Lite Amazon	51.8	58.3	46	27.2	75.7	94.3	—	58.3	81.1	—	—	—	36.9	17.4	71.1	—	—	—	—	—	—	—	94.3	—	—	—	—	—	—	—	75.7	—	—	—	—	—	—	—	—	—	—	—	—	10.9	81.8	—	—	—	—	—	—	58.3	—	2025	—	multimodal	API only	—	—	1M	229	0.89	$0.30	$2.50
#112	Mistral Large 3 Mistral AI	30.4	34.7	36.1	26.1	24.6	38	—	34.7	68	—	—	—	36.2	15.9	46.5	—	—	—	—	—	—	—	38	—	—	—	—	—	—	—	24.6	—	—	—	—	—	—	—	—	—	—	—	—	4.1	80.7	—	—	—	—	—	—	34.7	—	2025	—	llm	Open weights	675B (41B active)	—	262K	54	0.64	$0.50	$1.50
#113	Ministral 3 14B Mistral AI	23.6	22	30.9	14.1	27.2	30	—	22	57.2	—	—	—	23.6	4.5	35.1	—	—	—	—	—	—	—	30	—	—	—	—	—	—	—	27.2	—	—	—	—	—	—	—	—	—	—	—	—	4.6	69.3	—	—	—	—	—	—	22	—	2025	—	multimodal	Open weights	—	—	262K	67	0.41	$0.20	$0.20
#114	Ministral 3 8B Mistral AI	22.3	24	25.7	12.7	26.6	31.7	—	24	47.1	—	—	—	20.8	4.5	30.3	—	—	—	—	—	—	—	31.7	—	—	—	—	—	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	4.3	64.2	—	—	—	—	—	—	24	—	2025	—	multimodal	Open weights	—	—	262K	86	0.38	$0.15	$0.15
#115	Ministral 3 3B Mistral AI	16.1	11.7	20.5	7.2	24.9	22	—	11.7	35.8	—	—	—	14.4	0	24.7	—	—	—	—	—	—	—	22	—	—	—	—	—	—	—	24.9	—	—	—	—	—	—	—	—	—	—	—	—	5.3	52.4	—	—	—	—	—	—	11.7	—	2025	—	multimodal	Open weights	—	—	131K	154	0.34	$0.10	$0.10
#116	DeepSeek-V3.2 DeepSeek	64.2	65	53.1	48.2	90.6	92	—	65	84	—	—	—	38.9	35.6	86.2	—	70.2	—	—	—	—	—	92	—	—	—	—	—	—	—	90.6	—	—	—	—	—	—	—	—	—	—	—	—	22.2	86.2	—	—	—	—	—	—	65	—	2025	—	llm	Open weights	671B (37B active)	—	131K	—	—	$0.25	$0.38
#117	DeepSeek V3.2 Speciale DeepSeek	38.8	59.3	56.6	39.4	0	96.7	—	59.3	87.1	—	—	—	44	34.8	89.6	—	—	—	—	—	—	—	96.7	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	26.1	86.3	—	—	—	—	—	—	59.3	—	2025	—	llm	Open weights	—	—	164K	—	—	$0.29	$0.43
#118	Nova 2.0 Pro Amazon	57.9	61.7	43.7	33.5	92.7	89	—	61.7	78.5	—	—	—	42.7	24.2	73	—	—	—	—	—	—	—	89	—	—	—	—	—	—	—	92.7	—	—	—	—	—	—	—	—	—	—	—	—	8.9	83	—	—	—	—	—	—	61.7	—	2025	—	llm	—	—	—	—	149	0.81	$1.30	$10.00
#119	INTELLECT-3 Prime Intellect	31.8	32.3	44.1	24.1	26.6	88	—	32.3	76.1	—	—	—	39.1	9.1	77.7	—	—	—	—	—	—	—	88	—	—	—	—	—	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	12.1	82.2	—	—	—	—	—	—	32.3	—	2025	—	llm	Open weights	—	—	131K	—	—	$0.20	$1.10
#120	Nova 2.0 Omni Amazon	49.3	53.7	41.4	21.5	80.4	89.7	—	53.7	76	—	—	—	36.2	6.8	66	—	—	—	—	—	—	—	89.7	—	—	—	—	—	—	—	80.4	—	—	—	—	—	—	—	—	—	—	—	—	6.8	80.9	—	—	—	—	—	—	53.7	—	2025	—	llm	—	—	—	—	—	—	$0.30	$2.50
#121	Apriel-v1.6-15B-Thinker ServiceNow	46.8	50.3	41.6	25.9	69.3	88	—	50.3	73.3	—	—	—	37.3	14.4	80.7	—	—	—	—	—	—	—	88	—	—	—	—	—	—	—	69.3	—	—	—	—	—	—	—	—	—	—	—	—	9.8	79	—	—	—	—	—	—	50.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#122	Claude Opus 4.5 Anthropic	70.1	74	57.7	59.1	89.5	91.3	—	74	87	—	—	—	49.5	47	87.1	80.9	—	—	—	—	—	—	91.3	—	—	—	—	—	—	—	89.5	—	—	—	—	—	—	—	—	—	—	—	—	28.4	89.5	—	—	—	—	—	—	74	—	2025	—	llm	API only	—	—	200K	58	1.50	$5.00	$25.00
#123	Olmo 3 32B Think Allen Institute for AI	12.2	0	33.5	15.1	0	73.7	—	0	61	—	—	—	28.6	1.5	67.2	—	—	—	—	—	—	—	73.7	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	5.9	75.9	—	—	—	—	—	—	0	—	2025	—	llm	Open weights	—	—	66K	—	—	$0.15	$0.50
#124	Olmo 3 7B Instruct Allen Institute for AI	10.2	0	22.9	5.2	12.6	41.3	—	0	40	—	—	—	10.3	0	26.6	—	—	—	—	—	—	—	41.3	—	—	—	—	—	—	—	12.6	—	—	—	—	—	—	—	—	—	—	—	—	5.8	52.2	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.10	$0.20
#125	Olmo 3 7B Think Allen Institute for AI	9.9	0	28.7	11	0	70.7	—	0	51.6	—	—	—	21.2	0.8	61.7	—	—	—	—	—	—	—	70.7	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	5.7	65.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.