298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#251	Llama 4 Scout Meta	20.4	25.8	30.8	9.3	15.5	49.2	80.8	25.8	57.2	—	—	—	17	1.5	32.8	—	—	—	—	67.8	—	—	14	84.4	—	50.3	—	90.6	—	—	15.5	—	—	—	—	—	—	69.4	70.7	88.8	94.4	—	—	4.3	74.3	79.6	—	—	—	—	—	25.8	—	2025	—	multimodal	Open weights	109B total / 17B active (MoE)	2024	10M	776	0.31	$0.08	$0.30
#252	Gemini 2.5 Pro Google	50.1	58.4	35.6	52.4	54.1	92.2	79.6	66	84	—	4.9	—	42.8	26.5	80.1	63.8	76.5	—	72.7	—	—	—	88	96.7	92	92	—	—	—	—	54.1	—	—	—	—	—	—	79.6	—	—	—	—	—	17.8	86	—	—	50.8	—	—	—	66	—	2025	—	multimodal	API only	—	2025	1M	85	0.70	$1.25	$10.00
#253	DeepSeek-V3 0324 DeepSeek	40.1	41	36.8	35.4	47.1	64.8	—	41	68.4	—	—	—	35.8	15.2	49.2	—	55.1	—	—	—	—	—	41	94	59.4	—	—	—	—	—	47.1	—	—	—	—	—	—	—	—	—	—	—	—	5.2	81.2	—	—	—	—	—	—	41	—	2025	—	llm	Open weights	671000000000	—	164K	—	—	$0.28	$1.14
#254	Llama-3.3 Nemotron Super 49B v1 NVIDIA	23.7	17	36.6	14.1	26.9	77.5	—	17	66.7	—	—	—	28.2	0	28	—	—	—	—	91.3	—	—	58.4	96.6	—	—	—	—	—	—	26.9	—	—	—	—	—	—	—	—	—	—	—	—	6.5	78.5	—	—	—	—	—	88.3	17	—	2025	—	llm	Open weights	49900000000	2023	—	—	—	$0.00	$0.00
#255	Mistral Small 3.1 Mistral AI	21.8	19.7	25.1	17.1	25.1	37.2	—	19.7	45.4	—	—	—	26.5	7.6	21.2	—	—	—	—	—	—	—	3.7	70.7	—	—	—	—	—	—	25.1	—	—	—	—	—	—	—	—	—	—	—	—	4.8	65.9	—	—	—	—	—	—	19.7	—	2025	—	llm	—	—	—	—	134	0.52	$0.10	$0.20
#256	Command A Cohere	50.5	46	43.8	31.4	80.7	47.5	—	46	76.1	—	—	—	37.8	25	28.7	—	—	—	—	—	—	—	13	81.9	—	—	—	—	—	—	80.7	—	—	—	—	—	—	—	—	—	—	—	—	11.4	71.2	—	—	—	—	—	—	46	—	2025	—	llm	Open weights	—	2024	256K	203	0.17	$2.50	$10.00
#257	Gemma 3 1B Instruct Google	6.4	0	14.5	0.4	10.5	25.9	—	0	23.7	—	—	—	0.7	0	1.7	—	—	—	—	—	—	—	3.3	48.4	—	—	—	—	—	—	10.5	—	—	—	—	—	—	—	—	—	—	—	—	5.2	13.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#258	OLMo 2 32B Allen Institute for AI	5.6	0	18.3	4	0	3.3	—	0	32.8	—	—	—	8	0	6.8	—	—	—	—	—	—	—	3.3	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	3.7	51.1	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#259	Gemma 3 27B Instruct Google	13.1	5.7	23.8	12.5	10.5	54.5	—	5.7	42.8	—	—	—	21.2	3.8	13.7	—	—	—	—	—	—	—	20.7	88.3	—	—	—	—	—	—	10.5	—	—	—	—	—	—	—	—	—	—	—	—	4.7	66.9	—	—	—	—	—	—	5.7	—	2025	—	llm	—	—	—	—	—	—	$0.10	$0.30
#260	Gemma 3 12B Instruct Google	11.6	6.7	19.8	9.1	10.8	51.8	—	6.7	34.9	—	—	—	17.4	0.8	13.7	—	—	—	—	—	—	—	18.3	85.3	—	—	—	—	—	—	10.8	—	—	—	—	—	—	—	—	—	—	—	—	4.8	59.5	—	—	—	—	—	—	6.7	—	2025	—	llm	—	—	—	—	—	—	$0.10	$0.30
#261	Reka Flash 3 Reka AI	10.6	0	29	13.4	0	61.5	—	0	52.9	—	—	—	26.7	0	43.5	—	—	—	—	—	—	—	33.7	89.3	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	5.1	66.9	—	—	—	—	—	—	0	—	2025	—	llm	Open weights	—	2025	66K	93	2.81	$0.10	$0.20
#262	Gemma 3 4B Instruct Google	8	5.7	17.2	4.1	5	44.7	—	5.7	29.1	—	—	—	7.3	0.8	11.2	—	—	—	—	—	—	—	12.7	76.6	—	—	—	—	—	—	5	—	—	—	—	—	—	—	—	—	—	—	—	5.2	41.7	—	—	—	—	—	—	5.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.10
#263	QwQ-32B Alibaba	39.1	25	36.7	28.4	66.4	66.4	—	25	65.2	—	—	—	35.8	—	63.4	—	20.9	—	—	—	—	—	29	90.6	79.5	—	—	—	—	—	—	—	—	66.4	—	—	—	—	—	—	—	—	—	8.2	76.4	—	83.9	—	—	73.1	—	25	—	2025	—	llm	Open weights	32500000000	2024	—	31	0.45	$0.70	$1.00
#264	GPT-4.5 OpenAI	60.1	62.5	71.4	38	68.4	36.7	73.8	—	71.4	—	—	—	—	—	—	38	—	88	44.9	—	—	—	—	—	36.7	85	97	—	—	—	—	68.4	50	—	—	—	—	75.2	72.3	—	—	—	—	—	—	90.8	88.2	62.5	70.8	—	—	—	—	2025	—	multimodal	API only	—	—	128K	50	20.00	$75.00	$150.00
#265	Claude 3.7 Sonnet Anthropic	57.3	60.7	47.6	52.7	68	79.1	75	60.7	84.8	—	—	—	40.3	35.2	47.3	70.3	64.9	—	—	—	—	—	61	96.2	80	82	—	—	—	—	54.7	81.2	58.4	—	—	—	—	75	—	—	—	—	—	10.3	83.7	86.1	93.2	—	—	—	—	60.7	—	2025	—	llm	API only	—	—	200K	101	0.40	$3.00	$15.00
#266	Grok 3 mini Reasoning xAI	53.7	50.3	45.1	29	90.4	92	—	50.3	79.1	—	—	—	40.6	17.4	69.6	—	—	—	—	—	—	—	84.7	99.2	—	—	—	—	—	—	90.4	—	—	—	—	—	—	—	—	—	—	—	—	11.1	82.8	—	—	—	—	—	—	50.3	—	2025	—	llm	—	—	—	—	33	0.52	$0.30	$0.50
#267	Grok-3 xAI	43.1	54.7	44.8	24.1	48.8	91.2	78	54.7	84.6	—	—	—	36.8	11.4	79.4	—	—	—	—	—	—	—	93.3	87	93.3	—	—	—	—	—	48.8	—	—	—	—	—	—	78	—	—	—	—	—	5.1	80	—	—	—	—	—	—	54.7	—	2025	—	multimodal	API only	—	2024	128K	100	0.70	$3.00	$15.00
#268	o3-mini OpenAI	36.3	27.2	32.9	40.7	44.5	65	—	39.3	77.2	—	—	—	39.9	6.8	73.4	49.3	66.7	—	60.4	—	—	—	—	98.5	87.3	97.9	—	92	—	9.2	31.3	57.6	32.4	—	—	—	—	—	—	—	—	—	—	12.3	80.2	86.9	93.9	15	79.5	84.6	—	39.3	—	2025	—	llm	API only	—	2023	200K	115	5.20	$1.10	$4.40
#269	Mistral Small 3 Mistral AI	17.1	0	25.2	23.6	19.6	37.9	—	0	46.2	—	—	—	23.6	—	25.2	—	—	—	—	—	—	—	4.3	71.5	—	—	—	—	—	—	19.6	—	—	—	—	—	—	—	—	—	—	—	—	4.1	65.2	—	—	—	—	—	—	0	—	2025	—	llm	Open weights	—	2023	33K	136	0.53	$0.05	$0.08
#270	DeepSeek-R1 DeepSeek	34.3	52.3	40.4	32.9	11.4	82.3	—	52.3	71.5	—	—	—	35.7	6.1	61.7	—	56.9	—	—	—	—	—	68	96.6	—	—	—	—	—	—	11.4	—	—	—	—	—	—	—	—	—	—	—	—	9.3	84.4	90.8	—	—	—	—	—	52.3	—	2025	—	llm	Open weights	671B total / 37B active (MoE)	—	128K	189	0.07	$0.55	$2.19
#271	DeepSeek R1 Distill Llama 70B DeepSeek	21.3	11	35.7	16.4	21.9	78.3	—	11	65.2	—	—	—	31.3	1.5	57.5	—	—	—	—	—	—	—	53.7	94.5	86.7	—	—	—	—	—	21.9	—	—	—	—	—	—	—	—	—	—	—	—	6.1	79.5	—	—	—	—	—	—	11	—	2025	—	llm	Open weights	70600000000	—	128K	37	0.65	$0.10	$0.40
#272	Phi 4 Microsoft	11.6	1.5	30.1	14.9	0	49.5	—	0	56.1	75.5	—	—	26	3.8	23.1	—	—	82.8	—	—	—	—	18	81	—	80.4	—	80.6	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	4.1	70.4	84.8	63	3	—	47.6	75.4	0	—	2025	—	llm	Open weights	—	2024	16K	33	0.20	$0.07	$0.14
#273	DeepSeek-V3 DeepSeek	30.5	34.2	31.4	33.5	22.8	51.8	—	38.9	59.1	91.6	—	—	35.4	6.8	37.6	42	49.6	—	79.7	—	—	—	26	90.2	39.2	—	—	—	—	—	22.8	—	—	—	—	—	—	—	—	—	—	—	—	3.6	75.9	88.5	86.1	24.9	—	—	—	29	48.7	2024	—	llm	Open weights	671B total / 37B active (MoE)	2024	131K	100	0.50	$0.23	$0.91
#274	Gemini 2.0 Flash Google	27.9	28.3	33.7	20	29.5	57.4	70.7	28.3	62.1	—	—	—	34	3.8	35.1	—	22.2	—	—	—	—	—	21.7	93	—	89.7	—	—	—	—	29.5	—	—	—	—	—	—	70.7	—	—	—	—	—	5.3	76.4	87	—	—	—	—	—	28.3	—	2024	—	multimodal	API only	—	2024	1M	183	0.40	$0.10	$0.40
#275	Llama 3.3 70B Instruct Meta	20.9	15	27.3	14.5	26.6	42.5	—	15	50.5	—	—	—	26	3	28.8	—	—	88.4	—	—	—	—	7.7	77.3	—	77	—	91.1	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	4	68.9	86	92.1	—	—	—	—	15	—	2024	—	llm	Open weights	—	2023	131K	2220	0.50	$0.10	$0.32

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.