298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#151	Ring-1T InclusionAI	34.4	45.7	43.8	21.8	26.3	89.3	—	45.7	77.4	—	—	—	36.7	6.8	64.3	—	—	—	—	—	—	—	89.3	—	—	—	—	—	—	—	26.3	—	—	—	—	—	—	—	—	—	—	—	—	10.2	80.6	—	—	—	—	—	—	45.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#152	Ling-1T InclusionAI	32.5	34.7	39.6	22.9	32.7	71.3	—	34.7	71.9	—	—	—	35.2	10.6	67.7	—	—	—	—	—	—	—	71.3	—	—	—	—	—	—	—	32.7	—	—	—	—	—	—	—	—	—	—	—	—	7.2	82.2	—	—	—	—	—	—	34.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#153	Jamba Reasoning 3B AI21 Labs	11.3	7	19	3.4	15.8	10.7	—	7	33.3	—	—	—	5.9	0.8	21	—	—	—	—	—	—	—	10.7	—	—	—	—	—	—	—	15.8	—	—	—	—	—	—	—	—	—	—	—	—	4.6	57.7	—	—	—	—	—	—	7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#154	LFM2 8B A1B Liquid AI	8.4	0	19.7	3.4	10.5	25.3	—	0	34.4	—	—	—	6.8	0	15.1	—	—	—	—	—	—	—	25.3	—	—	—	—	—	—	—	10.5	—	—	—	—	—	—	—	—	—	—	—	—	4.9	50.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#155	Qwen3 VL 30B A3B Instruct Alibaba	24.8	23.7	38	18.5	19	72.3	—	23.7	69.5	—	—	—	30.8	6.1	47.6	—	—	—	—	—	—	—	72.3	—	—	—	—	—	—	—	19	—	—	—	—	—	—	—	—	—	—	—	—	6.4	76.4	—	—	—	—	—	—	23.7	—	2025	—	multimodal	Open weights	—	2025	262K	123	0.98	$0.13	$0.52
#156	Qwen3 VL 30B A3B Alibaba	29.5	40.7	40.4	17.1	19.9	82.3	—	40.7	72	—	—	—	28.8	5.3	69.7	—	—	—	—	—	—	—	82.3	—	—	—	—	—	—	—	19.9	—	—	—	—	—	—	—	—	—	—	—	—	8.7	80.7	—	—	—	—	—	—	40.7	—	2025	—	llm	—	—	—	—	122	1.14	$0.20	$0.80
#157	GLM-4.6 Zhipu AI	53.4	54.3	49.1	49	61	93.9	—	54.3	81	—	—	—	38.4	40.5	69.5	68	—	—	—	—	—	—	93.9	—	—	—	—	—	—	—	76.9	—	—	—	45.1	—	—	—	—	—	—	—	—	17.2	82.9	—	—	—	—	—	—	54.3	—	2025	—	llm	Open weights	357B (MoE)	2025	203K	85	0.70	$0.43	$1.74
#158	Apriel-v1.5-15B-Thinker ServiceNow	38.2	20	41.7	22.7	68.4	87.5	—	20	71.3	—	—	—	34.8	10.6	72.8	—	—	—	—	—	—	—	87.5	—	—	—	—	—	—	—	68.4	—	—	—	—	—	—	—	—	—	—	—	—	12	77.3	—	—	—	—	—	—	20	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#159	Claude Sonnet 4.5 Anthropic	63.9	65.7	50.4	57.3	82.2	87	—	65.7	83.4	—	—	—	44.7	50	71.4	77.2	—	—	—	—	—	—	87	—	—	—	—	—	—	—	78.1	86.2	70	—	—	—	—	—	—	—	—	—	—	17.3	87.5	—	—	—	—	—	—	65.7	—	2025	—	llm	API only	—	2025	1M	42	0.40	$3.00	$15.00
#160	DeepSeek V3.2 Exp DeepSeek	56.3	83.1	49.9	55	37	86.4	—	69	79.9	—	—	—	39.9	37.7	74.1	67.8	74.5	—	—	—	—	—	89.3	—	—	—	—	—	83.6	—	33.9	—	—	—	40.1	—	—	—	—	—	—	—	—	19.8	85	—	—	97.1	—	—	—	69	—	2025	—	llm	Open weights	—	2025	164K	100	0.70	$0.27	$0.41
Index 56.3 = (83.1 + 49.9 + 55.0 + 37.0 / 4) — equal-weighted mean of 4 components. General25% 83.1 SimpleQA97.1 AA-LCR69 LongBench-v2— IFBench— Reasoning25% 49.9 GPQA Diamond79.9 Humanity’s Last Exam19.8 FrontierMath— ARC-AGI-2— Coding25% 55 SWE-bench Verified67.8 Terminal-Bench37.7 Aider Polyglot74.5 SciCode39.9 Tool use & agents25% 37 TAU-bench Retail— τ²-bench33.9 BFCL— BrowseComp40.1 Full breakdown for DeepSeek V3.2 Exp
#161	Gemini 2.5 Flash Google	41.9	44.3	47.8	43.8	31.6	88.1	79.7	61.7	82.8	—	—	—	39.4	13.6	71.3	60.4	61.9	—	56.7	—	—	—	78.3	98.1	88	—	—	—	—	—	31.6	—	—	—	—	—	—	79.7	—	—	—	—	—	12.7	84.2	—	—	26.9	—	—	—	61.7	—	2025	—	multimodal	API only	—	2025	1M	85	0.70	$0.30	$2.50
#162	GPT-5 Codex OpenAI	65.4	69	54.7	51.1	86.8	98.7	—	69	83.7	—	—	—	40.9	37.9	84	74.5	—	—	—	—	—	—	98.7	—	—	—	—	—	—	—	86.8	—	—	—	—	—	—	—	—	—	—	—	—	25.6	86.5	—	—	—	—	—	—	69	—	2025	—	multimodal	API only	—	2024	400K	180	6.64	$1.25	$10.00
#163	Qwen3 Max Alibaba	48.6	46.7	43.8	29.4	74.3	80.7	—	46.7	76.4	—	—	—	38.3	20.5	76.7	—	—	—	—	—	—	—	80.7	—	—	—	—	—	—	—	74.3	—	—	—	—	—	—	—	—	—	—	—	—	11.1	84.1	—	—	—	—	—	—	46.7	—	2025	—	llm	API only	—	2025	262K	45	1.71	$0.78	$3.90
#164	Qwen3 VL 235B A22B Alibaba	45.6	58.7	43.7	25.7	54.1	88.3	—	58.7	77.2	—	—	—	39.9	11.4	64.6	—	—	—	—	—	—	—	88.3	—	—	—	—	—	—	—	54.1	—	—	—	—	—	—	—	—	—	—	—	—	10.1	83.6	—	—	—	—	—	—	58.7	—	2025	—	llm	—	—	—	—	34	1.75	$0.80	$6.20
#165	Qwen3 VL 235B A22B Instruct Alibaba	31.7	31.7	38.8	21.3	35.1	70.7	—	31.7	71.2	—	—	—	35.9	6.8	59.4	—	—	—	—	—	—	—	70.7	—	—	—	—	—	—	—	35.1	—	—	—	—	—	—	—	—	—	—	—	—	6.3	82.3	—	—	—	—	—	—	31.7	—	2025	—	multimodal	Open weights	—	2025	262K	51	1.20	$0.20	$0.88
#166	LFM2 2.6B Liquid AI	8.3	0	17.9	1.7	13.5	8.3	—	0	30.6	—	—	—	2.5	0.8	8.1	—	—	—	—	—	—	—	8.3	—	—	—	—	—	—	—	13.5	—	—	—	—	—	—	—	—	—	—	—	—	5.2	29.8	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#167	DeepSeek V3.1 Terminus DeepSeek	46.4	65	47.2	36.2	37.1	89.7	—	65	79.2	—	—	—	40.6	31.8	79.8	—	—	—	—	—	—	—	89.7	—	—	—	—	—	—	—	37.1	—	—	—	—	—	—	—	—	—	—	—	—	15.2	85.1	—	—	—	—	—	—	65	—	2025	—	llm	Open weights	—	2025	164K	—	—	$0.27	$0.95
#168	Qwen3 Omni 30B A3B Alibaba	19.6	0	39.9	17.2	21.3	74	—	0	72.6	—	—	—	30.6	3.8	67.9	—	—	—	—	—	—	—	74	—	—	—	—	—	—	—	21.3	—	—	—	—	—	—	—	—	—	—	—	—	7.3	79.2	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	102	1.05	$0.30	$1.00
#169	Granite 4.0 H Small IBM	15.2	9	22.7	11.6	17.3	13.7	—	9	41.6	—	—	—	20.9	2.3	25.1	—	—	—	—	—	—	—	13.7	—	—	—	—	—	—	—	17.3	—	—	—	—	—	—	—	—	—	—	—	—	3.7	62.4	—	—	—	—	—	—	9	—	2025	—	llm	—	—	—	—	524	8.71	$0.10	$0.30
#170	Qwen3 Omni 30B A3B Instruct Alibaba	15	0	33.6	10.1	16.4	52.3	—	0	62	—	—	—	18.6	1.5	42.2	—	—	—	—	—	—	—	52.3	—	—	—	—	—	—	—	16.4	—	—	—	—	—	—	—	—	—	—	—	—	5.1	72.5	—	—	—	—	—	—	0	—	2025	—	llm	—	—	—	—	103	1.04	$0.30	$1.00
#171	Grok 4 Fast xAI	55	79.9	52.9	31.6	55.4	92.7	—	64.7	85.7	—	—	—	44.2	18.9	80	—	—	—	—	—	—	—	92	—	—	—	—	—	93.3	—	65.8	—	—	—	44.9	—	—	—	—	—	—	—	—	20	85	—	—	95	—	—	—	64.7	—	2025	—	llm	API only	—	—	2M	90	—	$0.20	$0.50
#172	Ring-flash-2.0 InclusionAI	18.5	21	40.7	12.2	0	83.7	—	21	72.5	—	—	—	16.8	7.6	62.8	—	—	—	—	—	—	—	83.7	—	—	—	—	—	—	—	0	—	—	—	—	—	—	—	—	—	—	—	—	8.9	79.3	—	—	—	—	—	—	21	—	2025	—	llm	—	—	—	—	—	—	$0.10	$0.60
#173	Magistral Medium 1.2 Mistral AI	42.8	51.3	41.8	26.1	52	82	—	51.3	73.9	—	—	—	39.2	12.9	75	—	—	—	—	—	—	—	82	—	—	—	—	—	—	—	52	—	—	—	—	—	—	—	—	—	—	—	—	9.6	81.5	—	—	—	—	—	—	51.3	—	2025	—	llm	—	—	—	—	42	0.50	$2.00	$5.00
#174	Magistral Small 1.2 Mistral AI	25.1	16.3	36.2	19.9	27.8	80.3	—	16.3	66.3	—	—	—	35.2	4.5	72.3	—	—	—	—	—	—	—	80.3	—	—	—	—	—	—	—	27.8	—	—	—	—	—	—	—	—	—	—	—	—	6.1	76.8	—	—	—	—	—	—	16.3	—	2025	—	llm	—	—	—	—	106	0.38	$0.50	$1.50
#175	Ling-flash-2.0 InclusionAI	22.9	15	36	19.8	20.8	65.3	—	15	65.7	—	—	—	28.9	10.6	58.9	—	—	—	—	—	—	—	65.3	—	—	—	—	—	—	—	20.8	—	—	—	—	—	—	—	—	—	—	—	—	6.3	77.7	—	—	—	—	—	—	15	—	2025	—	llm	—	—	—	—	91	1.61	$0.10	$0.60

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.