298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#226	Sarvam M Sarvam	8.2	0	22.5	10.1	0	84.7	—	0	41.6	—	—	—	17.8	2.3	29.5	—	—	—	—	—	—	—	—	84.7	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	3.3	69.6	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	136	1.17	$0.00	$0.00
#227	Claude Sonnet 4 Anthropic	58.1	64.7	42.5	52.4	72.6	84.8	74.4	64.7	75.4	—	—	—	40	35.5	65.5	72.7	61.3	—	—	—	—	—	70.5	99.1	—	—	—	—	—	—	64.6	80.5	60	—	—	—	—	74.4	—	—	—	—	—	9.6	84.2	88	—	—	—	—	—	64.7	—	2025	—	llm	API only	—	2025	1M	101	0.40	$3.00	$15.00
#228	Claude Opus 4 Anthropic	50.7	36	33.3	56.2	77.4	86.9	—	36	79.6	—	8.6	—	40.9	39.2	63.6	72.5	72	—	—	—	—	—	75.5	98.2	—	—	—	—	—	—	73.4	81.4	59.6	—	—	—	—	—	—	—	—	—	—	11.7	87.3	88.8	—	—	—	—	—	36	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#229	Devstral Small Mistral AI	25.9	26.7	23.7	15.3	38	48.9	—	26.7	43.4	—	—	—	24.5	6.1	25.8	—	—	—	—	—	—	—	29.3	68.4	—	—	—	—	—	—	38	—	—	—	—	—	—	—	—	—	—	—	—	4	63.2	—	—	—	—	—	—	26.7	—	2025	—	llm	—	—	—	—	190	0.42	$0.10	$0.30
#230	Solar Pro 2 Upstage	21.8	0	37.9	17.4	31.9	79	—	0	68.7	—	—	—	30.2	4.5	61.6	—	—	—	—	—	—	—	61.3	96.7	—	—	—	—	—	—	31.9	—	—	—	—	—	—	—	—	—	—	—	—	7	80.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#231	Llama 3.1 Nemotron Nano 4B v1.1 NVIDIA	11.2	0	23	10.1	11.7	72.4	—	0	40.8	—	—	—	10.1	—	49.3	—	—	—	—	—	—	—	50	94.7	—	—	—	—	—	—	11.7	—	—	—	—	—	—	—	—	—	—	—	—	5.1	55.6	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#232	Gemma 3n E4B Instruct Google	6.9	0	17.3	5.4	5	45.7	—	0	29.6	—	—	—	8.6	2.3	14.6	—	—	—	—	—	—	—	14.3	77.1	—	—	—	—	—	—	5	—	—	—	—	—	—	—	—	—	—	—	—	4.9	48.8	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	56	0.55	$0.00	$0.00
#233	Mistral Medium 3 Mistral AI	25.5	28	31.1	18.5	24.3	60.5	—	28	57.8	—	—	—	33.1	3.8	40	—	—	—	—	—	—	—	30.3	90.7	—	—	—	—	—	—	24.3	—	—	—	—	—	—	—	—	—	—	—	—	4.3	76	—	—	—	—	—	—	28	—	2025	—	multimodal	API only	—	2025	131K	32	0.56	$0.40	$2.00
#234	Nova Premier Amazon	29.1	30	30.8	17.3	38.3	50.6	—	30	56.9	—	—	—	27.9	6.8	31.7	—	—	—	—	—	—	—	17.3	83.9	—	—	—	—	—	—	38.3	—	—	—	—	—	—	—	—	—	—	—	—	4.7	73.3	—	—	—	—	—	—	30	—	2025	—	llm	—	—	—	—	40	1.31	$2.50	$12.50
#235	Qwen3 32B Alibaba	28.5	0	37.6	26.1	50.1	83.5	—	0	66.8	—	—	—	35.4	3	65.7	—	40	—	—	—	—	—	72.9	96.1	81.4	—	—	—	—	—	29.8	—	—	70.3	—	—	—	—	—	—	—	—	—	8.3	79.8	—	—	—	—	74.9	93.8	0	—	2025	—	llm	Open weights	—	2025	131K	328	0.93	$0.08	$0.28
#236	Qwen3 235B A22B Alibaba	25.4	0	29.6	23	49	86.7	—	0	47.5	—	—	88.9	39.9	6.1	70.7	—	—	—	—	81.4	65.9	—	81.5	93	85.7	71.8	94.4	83.5	—	—	27.2	—	—	70.8	—	—	—	—	—	—	—	—	—	11.7	68.2	87.8	—	—	—	77.1	95.6	0	—	2025	—	llm	Open weights	—	2025	131K	68	0.78	$0.46	$1.82
#237	Qwen3 30B A3B Alibaba	25.4	0	36.2	17.7	47.6	82.4	—	0	65.8	—	—	—	28.5	6.8	62.6	—	—	—	—	—	—	—	70.9	95.9	80.4	—	—	—	—	—	26	—	—	69.1	—	—	—	—	—	—	—	—	—	6.6	77.7	—	—	—	72.2	74.3	91	0	—	2025	—	llm	Open weights	—	2025	131K	122	0.66	$0.09	$0.45
#238	Qwen3 14B Alibaba	21.4	0	32.4	18.5	34.5	77.1	—	0	60.4	—	—	—	31.6	5.3	52.3	—	—	—	—	—	—	—	58	96.1	—	—	—	—	—	—	34.5	—	—	—	—	—	—	—	—	—	—	—	—	4.3	77.4	—	—	—	—	—	—	0	—	2025	—	llm	Open weights	—	2025	132K	62	1.01	$0.10	$0.24
#239	Qwen3 8B Alibaba	18	0	31.6	12.5	27.8	57.4	—	0	58.9	—	—	—	22.6	2.3	40.6	—	—	—	—	—	—	—	24.3	90.4	—	—	—	—	—	—	27.8	—	—	—	—	—	—	—	—	—	—	—	—	4.2	74.3	—	—	—	—	—	—	0	—	2025	—	llm	Open weights	—	2025	131K	69	1.29	$0.05	$0.40
#240	Qwen3 4B Alibaba	16.1	0	28.7	16.7	19	57.8	—	0	52.2	—	—	—	16.7	—	46.5	—	—	—	—	—	—	—	22.3	93.3	—	—	—	—	—	—	19	—	—	—	—	—	—	—	—	—	—	—	—	5.1	69.6	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	103	1.02	$0.10	$0.40
#241	Qwen3 1.7B Alibaba	12.5	0	20.4	3.5	26	64.1	—	0	35.6	—	—	—	6.9	0	30.8	—	—	—	—	—	—	—	38.7	89.4	—	—	—	—	—	—	26	—	—	—	—	—	—	—	—	—	—	—	—	5.2	57	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	138	0.97	$0.10	$0.40
#242	Qwen3 0.6B Alibaba	9.5	0	14.8	2.1	21.1	46.5	—	0	23.9	—	—	—	4.1	0	12.1	—	—	—	—	—	—	—	18	75	—	—	—	—	—	—	21.1	—	—	—	—	—	—	—	—	—	—	—	—	5.7	34.7	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	225	0.95	$0.10	$1.30
#243	o3 OpenAI	56.3	69.3	33.6	57.1	65.2	73.3	82	69.3	87.7	—	6.5	—	41	37.1	80.8	69.1	81.3	—	—	—	—	—	86.4	99.2	91.6	—	—	—	—	15.8	80.7	—	—	—	49.7	64.8	80.2	82.9	86.8	—	—	76.4	—	24.3	85.3	—	—	—	—	—	—	69.3	—	2025	—	llm	API only	—	2024	200K	50	20.00	$2.00	$8.00
#244	o4-mini OpenAI	53.1	55	48.1	49.7	59.6	95	82.9	55	81.4	—	—	—	46.5	15.2	85.9	68.1	68.9	—	58.2	—	—	—	92.7	98.9	93.4	—	—	—	—	—	55.6	71.8	49.2	—	51.5	—	—	81.6	84.3	—	—	—	—	14.7	83.2	—	—	—	—	—	—	55	—	2025	—	multimodal	API only	—	2024	200K	115	5.20	$1.10	$4.40
#245	Granite 3.3 8B IBM	9.7	4.3	19	5.1	10.5	36.6	—	4.3	33.8	—	—	—	10.1	0	12.7	—	—	—	—	—	—	—	6.7	66.5	—	—	—	—	—	—	10.5	—	—	—	—	—	—	—	—	—	—	—	—	4.2	46.8	—	—	—	—	—	—	4.3	—	2025	—	llm	—	—	—	—	376	20.60	$0.00	$0.30
#246	GPT-4.1 OpenAI	48.5	61	35.9	39.5	57.6	53.7	73.5	61	66.3	—	—	—	38.1	13.6	45.7	54.6	51.6	94	52.9	—	—	—	46.4	91.3	48.1	87	—	—	28.9	—	47.1	68	49.4	—	—	—	—	74.8	72.2	—	—	—	—	5.4	80.6	90.2	87.4	—	70.8	—	—	61	—	2025	—	multimodal	API only	—	2024	1M	100	10.00	$2.00	$8.00
#247	GPT-4.1 Mini OpenAI	39.4	42.3	34.4	26.6	54.4	54.3	72.9	42.3	65	—	—	—	40.4	7.6	48.3	23.6	34.7	—	31.6	—	—	—	40.2	92.5	49.6	—	—	—	35	—	52.9	55.8	36	—	—	—	—	72.7	73.1	—	—	—	—	3.7	78.1	87.5	84.1	—	67	—	—	42.3	—	2025	—	multimodal	API only	—	2024	1M	150	5.00	$0.40	$1.60
#248	GPT-4.1 Nano OpenAI	19.3	17	27.1	13.2	20	46.1	55.8	17	50.3	—	—	—	25.9	3.8	32.6	—	9.8	—	6.2	—	—	—	24	84.8	29.4	—	—	—	—	—	17.3	22.6	14	—	—	—	—	55.4	56.2	—	—	—	—	3.9	65.7	80.1	74.5	—	57.2	—	—	17	—	2025	—	multimodal	API only	—	2024	1M	200	2.00	$0.10	$0.40
#249	Llama 3.1 Nemotron Ultra 253B v1 NVIDIA	19.8	7.3	42.1	18.5	11.4	84.8	—	7.3	76	—	—	—	34.7	2.3	66.3	—	—	—	—	—	—	—	72.5	97	—	—	—	—	—	—	11.4	—	—	—	—	—	—	—	—	—	—	—	—	8.1	82.5	—	89.5	—	—	—	—	7.3	—	2025	—	llm	Open weights	253000000000	2023	—	42	0.72	$0.60	$1.80
#250	Llama 4 Maverick Meta	30.6	46	37.3	21.4	17.8	54.1	78.2	46	69.8	—	—	—	33.1	6.8	43.4	30	15.6	—	—	77.6	—	—	19.3	88.9	—	61.2	—	92.3	—	—	17.8	—	—	—	—	—	—	73.4	73.7	90	94.4	59.6	—	4.8	80.5	85.5	—	—	—	—	—	46	—	2025	—	multimodal	Open weights	400B total / 17B active (MoE)	2024	1M	639	0.20	$0.15	$0.60

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.