298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

Overview Reasoning Coding Math Agents Multimodal General Long Context

Rank	Model	Index	General	Reason	Coding	Agents	Math	Multi	Long ctx	GPQA Diamond	DROP	ARC-AGI-2	BIG-Bench Hard	SciCode	Terminal-Bench	LiveCodeBench	SWE-bench Verified	Aider Polyglot	HumanEval	Aider Polyglot Edit	MBPP	MultiPL-E	SWE-bench Pro	AIME 2025	MATH-500	AIME 2024	MATH	GSM8K	MGSM	HMMT 2025	FrontierMath	τ²-bench	TAU-bench Retail	TAU-bench Airline	BFCL	BrowseComp	τ²-bench Airline	τ²-bench Retail	MMMU	MathVista	ChartQA	DocVQA	MMMU-Pro	AI2D	Humanity’s Last Exam	MMLU-Pro	MMLU	IFEval	SimpleQA	Multi-IF	LiveBench	Arena Hard	AA-LCR	LongBench-v2	Released ↓	Country	Type	Access	Params	Cutoff	Context	Speed	Latency	In $/M	Out $/M
#76	Nanbeige4.1-3B Nanbeige	20.6	0	47.5	13.3	21.6	—	—	0	84.9	—	—	—	26.6	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	21.6	—	—	—	—	—	—	—	—	—	—	—	—	10	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#77	Tri-21B-Think Trillion Labs	37.8	14.7	33.1	10.1	93.3	—	—	14.7	60.1	—	—	—	17.8	2.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	93.3	—	—	—	—	—	—	—	—	—	—	—	—	6.1	—	—	—	—	—	—	—	14.7	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#78	Qwen3 Max Thinking Alibaba	59.9	66	56.2	33.7	83.6	82.3	—	66	86.1	—	—	—	43.1	24.2	53.5	—	—	—	—	—	—	—	82.3	—	—	—	—	—	—	—	83.6	—	—	—	—	—	—	—	—	—	—	—	—	26.2	82.4	—	—	—	—	—	—	66	—	2026	—	llm	API only	—	—	262K	45	1.47	$0.78	$3.90
#79	Claude Opus 4.6 Anthropic	72.2	70.7	65.6	60.4	92.1	—	77.3	70.7	91.3	—	68.8	—	51.9	48.5	—	80.8	—	95	—	—	—	—	—	—	—	—	—	—	—	—	92.1	—	—	—	—	—	—	—	—	—	—	77.3	—	36.7	—	—	—	—	—	—	—	70.7	—	2026	—	llm	API only	—	—	1M	48	1.65	$5.00	$25.00
#80	Qwen3 Coder Next Alibaba	46.6	40	41.5	25.3	79.5	—	—	40	73.7	—	—	—	32.3	18.2	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	79.5	—	—	—	—	—	—	—	—	—	—	—	—	9.3	—	—	—	—	—	—	—	40	—	2026	—	llm	Open weights	—	—	262K	92	1.14	$0.11	$0.80
#81	Step 3.5 Flash StepFun	55.6	43	51.1	33.9	94.4	—	—	43	83.1	—	—	—	40.4	27.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	94.4	—	—	—	—	—	—	—	—	—	—	—	—	19.1	—	—	—	—	—	—	—	43	—	2026	—	llm	Open weights	—	—	262K	194	0.85	$0.09	$0.30
#82	LongCat Flash Lite LongCat	39.9	25.7	34.8	19.5	79.5	—	—	25.7	63.6	—	—	—	28.4	10.6	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	79.5	—	—	—	—	—	—	—	—	—	—	—	—	6	—	—	—	—	—	—	—	25.7	—	2026	—	llm	—	—	—	—	110	5.59	$0.00	$0.00
#83	Kimi K2.5 Moonshot AI	65.5	65.3	58.7	41.9	95.9	—	—	65.3	87.9	—	—	—	49	34.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	95.9	—	—	—	—	—	—	—	—	—	—	—	—	29.4	—	—	—	—	—	—	—	65.3	—	2026	—	multimodal	Open weights	1T (32B active)	—	262K	35	1.33	$0.40	$1.90
#84	Solar Pro 3 Upstage	42.7	27	41.3	16.2	86.3	—	—	27	72.4	—	—	—	24.7	7.6	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	86.3	—	—	—	—	—	—	—	—	—	—	—	—	10.1	—	—	—	—	—	—	—	27	—	2026	—	llm	API only	—	—	128K	—	—	$0.15	$0.60
#85	Step3 VL 10B StepFun	18.5	0	39.6	18.2	16.1	—	—	0	69	—	—	—	31.1	5.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	16.1	—	—	—	—	—	—	—	—	—	—	—	—	10.2	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#86	LFM2.5-1.2B-Thinking Liquid AI	10.4	0	20	2.1	19.6	—	—	0	33.9	—	—	—	4.2	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	19.6	—	—	—	—	—	—	—	—	—	—	—	—	6.1	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#87	GLM 4.7 Flash Zhipu AI	48.6	35	32.6	27.9	98.8	—	—	35	58.1	—	—	—	33.7	22	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	98.8	—	—	—	—	—	—	—	—	—	—	—	—	7.1	—	—	—	—	—	—	—	35	—	2026	—	llm	Open weights	—	—	203K	113	1.00	$0.06	$0.40
#88	GPT-5.2-Codex OpenAI	68.9	75.7	61.7	45.9	92.1	—	—	75.7	89.9	—	—	—	54.6	37.1	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	92.1	—	—	—	—	—	—	—	—	—	—	—	—	33.5	—	—	—	—	—	—	—	75.7	—	2026	—	multimodal	API only	—	—	400K	106	2.08	$1.75	$14.00
#89	Olmo 3.1 32B Instruct Allen Institute for AI	14.8	0	29.4	8.4	21.3	—	—	0	53.9	—	—	—	16.7	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	21.3	—	—	—	—	—	—	—	—	—	—	—	—	4.9	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#90	LFM2.5-1.2B-Instruct Liquid AI	7.9	0	19.7	1.2	10.8	—	—	0	32.6	—	—	—	2.3	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	10.8	—	—	—	—	—	—	—	—	—	—	—	—	6.8	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#91	LFM2.5-VL-1.6B Liquid AI	6.8	0	17	1.5	8.5	—	—	0	28.9	—	—	—	3	0	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	8.5	—	—	—	—	—	—	—	—	—	—	—	—	5.1	—	—	—	—	—	—	—	0	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#92	Falcon-H1R-7B TII UAE	22.1	8.7	38.4	13.6	27.8	80	—	8.7	66.1	—	—	—	24.9	2.3	72.4	—	—	—	—	—	—	—	80	—	—	—	—	—	—	—	27.8	—	—	—	—	—	—	—	—	—	—	—	—	10.8	72.5	—	—	—	—	—	—	8.7	—	2026	—	llm	—	—	—	—	—	—	$0.00	$0.00
#93	K-EXAONE LG AI Research	51.2	55.7	45.7	29.2	74.3	90.3	—	55.7	78.3	—	—	—	35.6	22.7	76.8	—	—	—	—	—	—	—	90.3	—	—	—	—	—	—	—	74.3	—	—	—	—	—	—	—	—	—	—	—	—	13.1	83.8	—	—	—	—	—	—	55.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#94	HyperCLOVA X SEED Think Naver	38.2	11.7	33.5	20.3	87.4	59	—	11.7	61.5	—	—	—	28.4	12.1	62.9	—	—	—	—	—	—	—	59	—	—	—	—	—	—	—	87.4	—	—	—	—	—	—	—	—	—	—	—	—	5.5	78.5	—	—	—	—	—	—	11.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
#95	MiniMax M2.1 MiniMax	58	59	52.6	34.8	85.4	82.7	—	59	83	—	—	—	40.7	28.8	81	—	—	—	—	—	—	—	82.7	—	—	—	—	—	—	—	85.4	—	—	—	—	—	—	—	—	—	—	—	—	22.2	87.5	—	—	—	—	—	—	59	—	2025	—	llm	Open weights	—	—	205K	92	1.14	$0.29	$0.95
#96	GLM 4.7 Zhipu AI	63.5	64	55.5	38.5	95.9	95	—	64	85.9	—	—	—	45.1	31.8	89.4	—	—	—	—	—	—	—	95	—	—	—	—	—	—	—	95.9	—	—	—	—	—	—	—	—	—	—	—	—	25.1	85.6	—	—	—	—	—	—	64	—	2025	—	llm	Open weights	—	—	203K	98	0.83	$0.40	$1.75
Index 63.5 = (64.0 + 55.5 + 38.5 + 95.9 / 4) — equal-weighted mean of 4 components. General25% 64 SimpleQA— AA-LCR64 LongBench-v2— IFBench— Reasoning25% 55.5 GPQA Diamond85.9 Humanity’s Last Exam25.1 FrontierMath— ARC-AGI-2— Coding25% 38.5 SWE-bench Verified— Terminal-Bench31.8 Aider Polyglot— SciCode45.1 Tool use & agents25% 95.9 TAU-bench Retail— τ²-bench95.9 BFCL— BrowseComp— Full breakdown for GLM 4.7
#97	Gemini 3 Flash Google	66.3	66.3	62.6	55.7	80.4	97	—	66.3	90.4	—	—	—	50.6	38.6	90.8	78	—	—	—	—	—	—	97	—	—	—	—	—	—	—	80.4	—	—	—	—	—	—	—	—	—	—	—	—	34.7	89	—	—	—	—	—	—	66.3	—	2025	—	multimodal	API only	—	—	1M	191	1.05	$0.50	$3.00
#98	Solar Open 100B Upstage	34.1	36	37.5	14.6	48.2	—	—	36	65.7	—	—	—	26.9	2.3	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	48.2	—	—	—	—	—	—	—	—	—	—	—	—	9.2	—	—	—	—	—	—	—	36	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00
Index 34.1 = (36.0 + 37.5 + 14.6 + 48.2 / 4) — equal-weighted mean of 4 components. General25% 36 SimpleQA— AA-LCR36 LongBench-v2— IFBench— Reasoning25% 37.5 GPQA Diamond65.7 Humanity’s Last Exam9.2 FrontierMath— ARC-AGI-2— Coding25% 14.6 SWE-bench Verified— Terminal-Bench2.3 Aider Polyglot— SciCode26.9 Tool use & agents25% 48.2 TAU-bench Retail— τ²-bench48.2 BFCL— BrowseComp— Full breakdown for Solar Open 100B
#99	NVIDIA Nemotron 3 Nano 30B A3B NVIDIA	34.8	33.7	43	21.6	40.9	91	—	33.7	75.7	—	—	—	29.6	13.6	74.1	—	—	—	—	—	—	—	91	—	—	—	—	—	—	—	40.9	—	—	—	—	—	—	—	—	—	—	—	—	10.2	79.4	—	—	—	—	—	—	33.7	—	2025	—	llm	—	—	—	—	148	0.30	$0.10	$0.20
#100	K2 Think V2 MBZUAI Institute of Foundation Models	34.6	52.7	40.4	19.9	25.4	—	—	52.7	71.3	—	—	—	33	6.8	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	25.4	—	—	—	—	—	—	—	—	—	—	—	—	9.5	—	—	—	—	—	—	—	52.7	—	2025	—	llm	—	—	—	—	—	—	$0.00	$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.