AI models
Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.
Updated May 29, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?
| Rank | Model | Index | General | Reason | Coding | Agents | Math | Multi | Long ctx | GPQA Diamond | DROP | ARC-AGI-2 | BIG-Bench Hard | SciCode | Terminal-Bench | LiveCodeBench | SWE-bench Verified | Aider Polyglot | HumanEval | Aider Polyglot Edit | MBPP | MultiPL-E | SWE-bench Pro | AIME 2025 | MATH-500 | AIME 2024 | MATH | GSM8K | MGSM | HMMT 2025 | FrontierMath | τ²-bench | TAU-bench Retail | TAU-bench Airline | BFCL | BrowseComp | τ²-bench Airline | τ²-bench Retail | MMMU | MathVista | ChartQA | DocVQA | MMMU-Pro | AI2D | Humanity’s Last Exam | MMLU-Pro | MMLU | IFEval | SimpleQA | Multi-IF | LiveBench | Arena Hard | AA-LCR | LongBench-v2 | Released ↓ | Country | Type | Access | Params | Cutoff | Context | Speed | Latency | In $/M | Out $/M |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | 71.7 | 67.7 | 68.9 | 55.9 | 94.4 | — | — | 67.7 | 92 | — | — | — | 53.5 | 58.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 94.4 | — | — | — | — | — | — | — | — | — | — | — | — | 45.7 | — | — | — | — | — | — | — | 67.7 | — | 2026 | — | llm | API only | — | — | 1M | 66 | 6.54 | $5.00 | $25.00 | |
| #2 | MiniCPM5-1BNew OpenBMB | 25.9 | 4.7 | 15.8 | 0.7 | 82.5 | — | — | 4.7 | 26.9 | — | — | — | 1.4 | 0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 82.5 | — | — | — | — | — | — | — | — | — | — | — | — | 4.6 | — | — | — | — | — | — | — | 4.7 | — | 2026 | — | llm | — | — | — | — | — | — | $0.00 | $0.00 |
| #3 | Qwen3.7 MaxNew | 69.7 | 69 | 65.2 | 49.8 | 94.7 | — | — | 69 | 92.3 | — | — | — | 48.8 | 50.8 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 94.7 | — | — | — | — | — | — | — | — | — | — | — | — | 38.1 | — | — | — | — | — | — | — | 69 | — | 2026 | — | llm | API only | — | — | 1M | 203 | 1.59 | $1.25 | $3.75 |
Index 69.7 = (69.0 + 65.2 + 49.8 + 94.7 / 4) — equal-weighted mean of 4 components. General25% 69
Reasoning25% 65.2
Coding25% 49.8
Tool use & agents25% 94.7
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #4 | 70.7 | 71 | 66.6 | 49.7 | 95.6 | — | — | 71 | 92.2 | — | — | — | 53.1 | 46.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 95.6 | — | — | — | — | — | — | — | — | — | — | — | — | 41 | — | — | — | — | — | — | — | 71 | — | 2026 | — | multimodal | API only | — | 2025 | 1M | 221 | 9.75 | $1.50 | $9.00 | |
| #5 | JT-35B-FlashNew China Mobile | 57 | 55.3 | 44.5 | 29 | 99.1 | — | — | 55.3 | 82.9 | — | — | — | 29.1 | 28.8 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 99.1 | — | — | — | — | — | — | — | — | — | — | — | — | 6.1 | — | — | — | — | — | — | — | 55.3 | — | 2026 | — | llm | — | — | — | — | — | — | $0.00 | $0.00 |
| #6 | OpenBMB | 28.2 | 6.3 | 17.7 | 1.1 | 87.7 | — | — | 6.3 | 30.5 | — | — | — | 2.1 | 0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 87.7 | — | — | — | — | — | — | — | — | — | — | — | — | 4.9 | — | — | — | — | — | — | — | 6.3 | — | 2026 | — | llm | — | — | — | — | — | — | $0.00 | $0.00 |
| #7 | Ring-2.6-1TNew InclusionAI | 61.1 | 64.3 | 52 | 35.6 | 92.4 | — | — | 64.3 | 85.7 | — | — | — | 42.4 | 28.8 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 92.4 | — | — | — | — | — | — | — | — | — | — | — | — | 18.3 | — | — | — | — | — | — | — | 64.3 | — | 2026 | — | llm | API only | — | — | 262K | 120 | 1.88 | $0.08 | $0.63 |
| #8 | 44.7 | 65.3 | 49.2 | 33.1 | 31.3 | — | — | 65.3 | 82.2 | — | — | — | 41.9 | 24.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 31.3 | — | — | — | — | — | — | — | — | — | — | — | — | 16.2 | — | — | — | — | — | — | — | 65.3 | — | 2026 | — | multimodal | API only | — | — | 1M | 342 | 5.35 | $0.25 | $1.50 | |
| #9 | Grok 4.3New | 67 | 65 | 62.6 | 42.6 | 97.7 | — | — | 65 | 90.1 | — | — | — | 47.3 | 37.9 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 97.7 | — | — | — | — | — | — | — | — | — | — | — | — | 35 | — | — | — | — | — | — | — | 65 | — | 2026 | — | llm | API only | — | — | 1M | 88 | 0.52 | $1.25 | $2.50 |
| #10 | 51 | 55.7 | 52.5 | 46.3 | 49.4 | — | — | 55.7 | 84.6 | — | — | — | 50.3 | 42.4 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 49.4 | — | — | — | — | — | — | — | — | — | — | — | — | 20.3 | — | — | — | — | — | — | — | 55.7 | — | 2026 | — | llm | — | — | — | — | — | — | $5.00 | $30.00 | |
| #11 | 58.9 | 61 | 43.8 | 36.5 | 94.2 | — | — | 61 | 74.8 | — | — | — | 39.6 | 33.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 94.2 | — | — | — | — | — | — | — | — | — | — | — | — | 12.8 | — | — | — | — | — | — | — | 61 | — | 2026 | — | multimodal | API only | — | — | 262K | 140 | 0.58 | $1.50 | $7.50 | |
| #12 | 18.6 | 12 | 23.5 | 10.9 | 27.8 | — | — | 12 | 43.3 | — | — | — | 21.8 | 0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 27.8 | — | — | — | — | — | — | — | — | — | — | — | — | 3.8 | — | — | — | — | — | — | — | 12 | — | 2026 | — | llm | Open weights | — | — | 131K | 133 | 0.47 | $0.05 | $0.10 | |
| #13 | 31.3 | 35.7 | 26.1 | 18.1 | 45.3 | — | — | 35.7 | 46.9 | — | — | — | 27.8 | 8.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 45.3 | — | — | — | — | — | — | — | — | — | — | — | — | 5.3 | — | — | — | — | — | — | — | 35.7 | — | 2026 | — | llm | — | — | — | — | 301 | 0.58 | $0.10 | $0.30 | |
| #14 | 25.3 | 18.7 | 26.2 | 14.1 | 42.1 | — | — | 18.7 | 48.1 | — | — | — | 25.8 | 2.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 42.1 | — | — | — | — | — | — | — | — | — | — | — | — | 4.2 | — | — | — | — | — | — | — | 18.7 | — | 2026 | — | llm | — | — | — | — | — | — | $0.00 | $0.00 | |
| #15 | 11.8 | 3 | 17.4 | 7.1 | 19.6 | — | — | 3 | 31.4 | — | — | — | 11.9 | 2.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 19.6 | — | — | — | — | — | — | — | — | — | — | — | — | 3.4 | — | — | — | — | — | — | — | 3 | — | 2026 | — | llm | — | — | — | — | — | — | $0.00 | $0.00 | |
| #16 | 67.5 | 69.7 | 58.9 | 45.4 | 95.9 | — | — | 69.7 | 88.8 | — | — | — | 46.9 | 43.9 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 95.9 | — | — | — | — | — | — | — | — | — | — | — | — | 28.9 | — | — | — | — | — | — | — | 69.7 | — | 2026 | — | llm | API only | — | — | 262K | 36 | 2.79 | $1.04 | $6.24 | |
| #17 | 63.3 | 68.7 | 52.9 | 37.3 | 94.2 | — | — | 68.7 | 84.2 | — | — | — | 39.8 | 34.8 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 94.2 | — | — | — | — | — | — | — | — | — | — | — | — | 21.6 | — | — | — | — | — | — | — | 68.7 | — | 2026 | — | multimodal | Open weights | — | — | 262K | 64 | 1.40 | $0.29 | $3.20 | |
| #18 | 61.6 | 63.7 | 52.2 | 35.3 | 95.3 | — | — | 63.7 | 84.1 | — | — | — | 35.8 | 34.8 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 95.3 | — | — | — | — | — | — | — | — | — | — | — | — | 20.2 | — | — | — | — | — | — | — | 63.7 | — | 2026 | — | multimodal | Open weights | — | — | 262K | 169 | 1.47 | $0.14 | $1.00 | |
| #19 | 71.1 | 66.3 | 63 | 58.9 | 96.2 | — | — | 66.3 | 90.1 | — | — | — | 50 | 46.2 | 93.5 | 80.6 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 96.2 | — | — | — | — | — | — | — | — | — | — | — | — | 35.9 | 87.5 | — | — | — | — | — | — | 66.3 | — | 2026 | — | llm | Open weights | 1.6T (49B active) | — | 1M | 30 | 1.16 | $0.44 | $0.87 | |
Index 71.1 = (66.3 + 63.0 + 58.9 + 96.2 / 4) — equal-weighted mean of 4 components. General25% 66.3
Reasoning25% 63
Coding25% 58.9
Tool use & agents25% 96.2
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #20 | 65.3 | 63 | 60.8 | 41.8 | 95.6 | — | — | 63 | 89.4 | — | — | — | 44.9 | 38.6 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 95.6 | — | — | — | — | — | — | — | — | — | — | — | — | 32.1 | — | — | — | — | — | — | — | 63 | — | 2026 | — | llm | Open weights | 284B (13B active) | — | 1M | 109 | 0.76 | $0.10 | $0.20 | |
| #21 | 73.9 | 74.3 | 68.9 | 58.4 | 93.9 | — | — | 74.3 | 93.5 | — | — | — | 56.1 | 60.6 | — | — | — | — | — | — | — | 58.6 | — | — | — | — | — | — | — | — | 93.9 | — | — | — | — | — | — | — | — | — | — | — | — | 44.3 | — | — | — | — | — | — | — | 74.3 | — | 2026 | — | llm | API only | — | 2025 | 1.1M | 67 | 0.97 | $5.00 | $30.00 | |
| #22 | InclusionAI | 50.1 | 34.7 | 41.7 | 34.1 | 89.8 | — | — | 34.7 | 75.2 | — | — | — | 37 | 31.1 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 89.8 | — | — | — | — | — | — | — | — | — | — | — | — | 8.2 | — | — | — | — | — | — | — | 34.7 | — | 2026 | — | llm | API only | — | — | 262K | — | — | $0.08 | $0.63 |
| #23 | 68.6 | 73.3 | 60.2 | 46.7 | 94.2 | — | — | 73.3 | 86.6 | — | — | — | 50.2 | 43.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 94.2 | — | — | — | — | — | — | — | — | — | — | — | — | 33.8 | — | — | — | — | — | — | — | 73.3 | — | 2026 | — | llm | Open weights | — | — | 1M | 58 | 2.08 | $0.44 | $0.87 | |
| #24 | 62.7 | 62.7 | 55.1 | 42.4 | 90.6 | — | — | 62.7 | 84.9 | — | — | — | 43.1 | 41.7 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 90.6 | — | — | — | — | — | — | — | — | — | — | — | — | 25.2 | — | — | — | — | — | — | — | 62.7 | — | 2026 | — | multimodal | Open weights | — | — | 1M | 92 | 2.67 | $0.14 | $0.28 | |
| #25 | 60.3 | 54.7 | 56.1 | 37.7 | 92.7 | — | — | 54.7 | 86.7 | — | — | — | 41.2 | 34.1 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 92.7 | — | — | — | — | — | — | — | — | — | — | — | — | 25.5 | — | — | — | — | — | — | — | 54.7 | — | 2026 | — | llm | Open weights | — | — | 262K | 100 | 2.53 | $0.06 | $0.21 | |
Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.