298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#176	Qwen3 Next 80B A3B Instruct Alibaba	40.5	51.3	40.1	29.4	41.3	69.5	—	51.3	72.9	—	—	—	30.7	7.6	68.4	—	49.8	—	—	—	87.8	—	69.5	—	—	—	—	—	—	—	21.6	60.9	44	—	—	45.5	57.3	—	—	—	—	—	—	7.3	80.6	—	87.6	—	75.8	—	—	51.3	—	2025	—	llm	Open weights	—	2025	262K	161	1.14	$0.09	$1.10
#177	Qwen3-Next-80B-A3B Alibaba	42.5	60.3	43.8	24.3	41.5	84.3	—	60.3	75.9	—	—	—	38.8	9.8	78.4	—	—	—	—	—	—	—	84.3	—	—	—	—	—	—	—	41.5	—	—	—	—	—	—	—	—	—	—	—	—	11.7	82.4	—	—	—	—	—	—	60.3	—	2025	—	llm	Open weights	80B (3B active)	—	262K	147	1.14	$0.50	$6.00
#178	Ling-mini-2.0 InclusionAI	14.4	6.7	30.6	7.2	13.2	49.3	—	6.7	56.2	—	—	—	13.5	0.8	42.9	—	—	—	—	—	—	—	49.3	—	—	—	—	—	—	—	13.2	—	—	—	—	—	—	—	—	—	—	—	—	5	67.1	—	—	—	—	—	—	6.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#179	Kimi K2 0905 Moonshot AI	48.5	52.3	41.1	27.1	73.4	64.7	—	52.3	75.8	—	—	—	30.7	23.5	61	—	—	94.5	—	—	—	—	57.3	—	72	89.1	—	—	—	—	73.4	—	—	—	—	—	—	—	—	—	—	—	—	6.3	82.5	90.2	—	—	—	—	—	52.3	—	2025	—	llm	API only	1000000000000	—	262K	16	1.94	$0.60	$2.50
#180	Apertus 70B Instruct Swiss AI Initiative	8.1	0	16.4	2.9	12.9	—	—	0	27.2	—	—	—	5.7	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	12.9	—	—	—	—	—	—	—	—	—	—	—	—	5.5	—	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.80	$2.90
Index 8.1 = (0.0 + 16.4 + 2.9 + 12.9 / 4) — equal-weighted mean of 4 components. General25% 0 SimpleQA— AA-LCR0 LongBench-v2— IFBench— Reasoning25% 16.4 GPQA Diamond27.2 Humanity’s Last Exam5.5 FrontierMath— ARC-AGI-2— Coding25% 2.9 SWE-bench Verified— Terminal-Bench0 Aider Polyglot— SciCode5.7 Tool use & agents25% 12.9 TAU-bench Retail— τ²-bench12.9 BFCL— BrowseComp— Full breakdown for Apertus 70B Instruct
#181	Apertus 8B Instruct Swiss AI Initiative	7.2	0	15.3	2.1	11.4	—	—	0	25.6	—	—	—	4.1	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	11.4	—	—	—	—	—	—	—	—	—	—	—	—	5	—	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.10	$0.20
#182	Grok Code Fast 1 xAI	47.7	48.3	40.1	26.8	75.7	43.3	—	48.3	72.7	—	—	—	36.2	17.4	65.7	—	—	—	—	—	—	—	43.3	—	—	—	—	—	—	—	75.7	—	—	—	—	—	—	—	—	—	—	—	—	7.5	79.3	—	—	—	—	—	—	48.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#183	Hermes 4 - Llama-3.1 405B Nous Research	28	20.7	41.5	23	26.6	69.7	—	20.7	72.7	—	—	—	34.6	11.4	68.6	—	—	—	—	—	—	—	69.7	—	—	—	—	—	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	10.3	82.9	—	—	—	—	—	—	20.7	—	2025	—	llm	—	—	—	—	34	0.74	$1.00	$3.00
#184	Hermes 4 - Llama-3.1 70B Nous Research	21.9	6.7	38.9	19.3	22.5	68.7	—	6.7	69.9	—	—	—	34.1	4.5	65.3	—	—	—	—	—	—	—	68.7	—	—	—	—	—	—	—	22.5	—	—	—	—	—	—	—	—	—	—	—	—	7.9	81.1	—	—	—	—	—	—	6.7	—	2025	—	llm	—	—	—	—	60	0.67	$0.10	$0.40
#185	DeepSeek-V3.1 DeepSeek	50.9	73.4	45.4	51.2	33.7	49.9	—	53.3	74.9	—	—	—	39.1	31.3	56.4	66	68.4	—	—	—	—	—	49.8	—	66.3	—	—	—	33.5	—	37.4	—	—	—	30	—	—	—	—	—	—	—	—	15.9	83.7	—	—	93.4	—	—	—	53.3	—	2025	—	llm	Open weights	671B (37B active)	2025	164K	—	—	$0.21	$0.79
#186	Seed-OSS-36B-Instruct ByteDance	42.4	57.7	40.8	21.7	49.4	84.7	—	57.7	72.6	—	—	—	36.5	6.8	76.5	—	—	—	—	—	—	—	84.7	—	—	—	—	—	—	—	49.4	—	—	—	—	—	—	—	—	—	—	—	—	9.1	81.5	—	—	—	—	—	—	57.7	—	2025	—	llm	—	—	—	—	37	1.81	$0.20	$0.60
#187	NVIDIA Nemotron Nano 9B V2 NVIDIA	22.2	22.7	30.8	11.8	23.4	69.7	—	22.7	57	—	—	—	22	1.5	72.4	—	—	—	—	—	—	—	69.7	—	—	—	—	—	—	—	23.4	—	—	—	—	—	—	—	—	—	—	—	—	4.6	74.2	—	—	—	—	—	—	22.7	—	2025	—	llm	—	—	—	—	129	0.26	$0.00	$0.20
#188	Gemma 3 270M Google	5.6	0	13.3	0	9.1	2.3	—	0	22.4	—	—	—	0	0	0.3	—	—	—	—	—	—	—	2.3	—	—	—	—	—	—	—	9.1	—	—	—	—	—	—	—	—	—	—	—	—	4.2	5.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#189	Mistral Medium 3.1 Mistral AI	28.5	19.7	31.6	22.2	40.6	38.3	—	19.7	58.8	—	—	—	33.8	10.6	40.6	—	—	—	—	—	—	—	38.3	—	—	—	—	—	—	—	40.6	—	—	—	—	—	—	—	—	—	—	—	—	4.4	68.3	—	—	—	—	—	—	19.7	—	2025	—	multimodal	API only	—	2025	131K	47	0.69	$0.40	$2.00
#190	GLM 4.5V Zhipu AI	18.6	0	37.2	14.5	22.5	73	—	0	68.4	—	—	—	22.1	6.8	60.4	—	—	—	—	—	—	—	73	—	—	—	—	—	—	—	22.5	—	—	—	—	—	—	—	—	—	—	—	—	5.9	78.8	—	—	—	—	—	—	0	—	2025	—	multimodal	Open weights	—	2024	66K	85	0.70	$0.60	$1.80
#191	Jamba Large 1.7 AI21 Labs	15.7	17.3	21.4	10.6	13.5	31.2	—	17.3	39	—	—	—	18.8	2.3	18.1	—	—	—	—	—	—	—	2.3	60	—	—	—	—	—	—	13.5	—	—	—	—	—	—	—	—	—	—	—	—	3.8	57.7	—	—	—	—	—	—	17.3	—	2025	—	llm	Open weights	—	2024	256K	48	0.97	$2.00	$8.00
#192	GPT-5 OpenAI	63.3	75.6	46.1	60.9	70.7	78.4	81.3	75.6	87.3	—	—	—	42.9	37.9	84.6	74.9	88	93.4	—	—	—	—	94.6	99.4	—	84.7	—	—	93.3	26.3	86.5	—	—	—	54.9	62.6	81.1	84.2	—	—	—	78.4	—	24.8	87.1	92.5	—	—	—	—	—	75.6	—	2025	—	llm	API only	—	2024	400K	100	2.00	$1.25	$10.00
#193	GPT-5 mini OpenAI	54.2	68	40.4	37.2	71.1	67	—	68	82.3	—	—	—	41	33.3	83.8	—	—	—	—	—	—	—	91.1	—	—	—	—	—	87.8	22.1	71.1	—	—	—	—	—	—	—	—	—	—	—	—	16.7	83.7	—	—	—	—	—	—	68	—	2025	—	llm	API only	—	2024	400K	200	1.00	$0.25	$2.00
#194	GPT-5 nano OpenAI	33.8	41.7	29.8	27	36.5	56.8	—	41.7	71.2	—	—	—	36.6	17.4	78.9	—	—	—	—	—	—	—	85.2	—	—	—	—	—	75.6	9.6	36.5	—	—	—	—	—	—	—	—	—	—	—	—	8.7	78	—	—	—	—	—	—	41.7	—	2025	—	llm	API only	—	2024	400K	500	0.30	$0.05	$0.40
#195	Qwen3 4B 2507 Alibaba	28.3	37.7	36.3	13.6	25.4	82.7	—	37.7	66.7	—	—	—	25.6	1.5	64.1	—	—	—	—	—	—	—	82.7	—	—	—	—	—	—	—	25.4	—	—	—	—	—	—	—	—	—	—	—	—	5.9	74.3	—	—	—	—	—	—	37.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#196	Qwen3 4B 2507 Instruct Alibaba	18.4	7.3	28.2	11.3	26.6	52.3	—	7.3	51.7	—	—	—	18.1	4.5	37.7	—	—	—	—	—	—	—	52.3	—	—	—	—	—	—	—	26.6	—	—	—	—	—	—	—	—	—	—	—	—	4.7	67.2	—	—	—	—	—	—	7.3	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#197	Claude Opus 4.1 Anthropic	60.6	66.3	46.4	52.9	76.9	78	—	66.3	80.9	—	—	—	40.9	43.3	65.4	74.5	—	—	—	—	—	—	78	—	—	—	—	—	—	—	71.4	82.4	56	—	—	—	—	—	—	—	—	—	—	11.9	88	—	—	—	—	—	—	66.3	—	2025	—	llm	API only	—	2025	200K	120	0.40	$15.00	$75.00
#198	gpt-oss-120b OpenAI	52.3	50.7	50	41.7	66.8	93.4	—	50.7	80.9	—	—	—	38.9	23.5	87.8	62.4	41.8	—	—	—	—	—	93.4	—	—	—	—	—	—	—	65.8	67.8	—	—	—	—	—	—	—	—	—	—	—	19	80.8	90	—	—	—	—	—	50.7	—	2025	—	llm	Open weights	117B (5.1B active)	2024	131K	500	0.50	$0.04	$0.18
#199	gpt-oss-20b OpenAI	38.9	31	44.4	22.5	57.5	89.3	—	31	71.5	—	—	—	34.4	10.6	77.7	—	—	—	—	—	—	—	89.3	—	—	—	—	—	—	—	60.2	54.8	—	—	—	—	—	—	—	—	—	—	—	17.3	74.8	85.3	—	—	—	—	—	31	—	2025	—	llm	Open weights	21B (3.6B active)	2024	131K	1000	0.38	$0.03	$0.14
#200	Qwen3 Coder 30B A3B Instruct Alibaba	28.2	29	27.8	21.5	34.5	59.2	—	29	51.6	—	—	—	27.8	15.2	40.3	—	—	—	—	—	—	—	29	89.3	—	—	—	—	—	—	34.5	—	—	—	—	—	—	—	—	—	—	—	—	4	70.6	—	—	—	—	—	—	29	—	2025	—	llm	Open weights	—	2025	160K	97	1.49	$0.07	$0.27

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.