Leaderboards
Model rankings
A balanced intelligence index averages each model's per-category scores. Drill into a category for individual benchmarks, or sort by speed, price, and context. See what changed → How this is calculated → Embed this leaderboard →
Updated May 25, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Report an error
Price vs. intelligence
Intelligence index vs. input price — up and to the left is better value.
Speed vs. intelligence
Intelligence index vs. output speed — up and to the right is fast and smart.
| # | Model | Agents idx ↓ | BFCL | τ²-bench Airline | τ²-bench Retail | BrowseComp | TAU-bench Airline | TAU-bench Retail | Context | Speed | In $/M |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Llama 3.1 405B Instruct | 88.5 | 88.5 | — | — | — | — | — | 128K | 100 | $0.89 |
| 2 | Llama 3.1 70B Instruct | 84.8 | 84.8 | — | — | — | — | — | 131K | 1204 | $0.40 |
| 3 | Claude Sonnet 4.5 | 78.1 | — | — | — | — | 70 | 86.2 | 1M | 42 | $3.00 |
| 4 | Llama 3.1 8B Instruct | 76.1 | 76.1 | — | — | — | — | — | 131K | 2047 | $0.02 |
| 5 | Claude Haiku 4.5 | 73.4 | — | 63.6 | 83.2 | — | — | — | 200K | 100 | $1.00 |
| 6 | Qwen3 235B A22B | 70.8 | 70.8 | — | — | — | — | — | 131K | 68 | $0.46 |
| 7 | Claude Opus 4 | 70.5 | — | — | — | — | 59.6 | 81.4 | 200K | 120 | $15.00 |
| 8 | Claude Sonnet 4 | 70.3 | — | — | — | — | 60 | 80.5 | 1M | 101 | $3.00 |
| 9 | Qwen3 32B | 70.3 | 70.3 | — | — | — | — | — | 131K | 328 | $0.08 |
| 10 | Claude 3.7 Sonnet | 69.8 | — | — | — | — | 58.4 | 81.2 | 200K | 101 | $3.00 |
| 11 | Claude Opus 4.1 | 69.2 | — | — | — | — | 56 | 82.4 | 200K | 120 | $15.00 |
| 12 | Qwen3 30B A3B | 69.1 | 69.1 | — | — | — | — | — | 131K | 122 | $0.09 |
| 13 | Nova Pro | 68.4 | 68.4 | — | — | — | — | — | 300K | 100 | $0.80 |
| 14 | gpt-oss-120b | 67.8 | — | — | — | — | — | 67.8 | 131K | 500 | $0.04 |
| 15 | Nova Lite | 66.6 | 66.6 | — | — | — | — | — | 300K | 100 | $0.06 |
| 16 | QwQ-32B | 66.4 | 66.4 | — | — | — | — | — | — | 31 | $0.70 |
| 17 | GPT-5 | 66.2 | — | 62.6 | 81.1 | 54.9 | — | — | 400K | 100 | $1.25 |
| 18 | o3 | 64.9 | — | 64.8 | 80.2 | 49.7 | — | — | 200K | 50 | $2.00 |
| 19 | Kimi K2 Instruct | 63.6 | — | 56.5 | 70.6 | — | — | — | 131K | 45 | $0.57 |
| 20 | Kimi K2-Instruct-0905 | 63.6 | — | 56.5 | 70.6 | — | — | — | — | — | — |
| 21 | Qwen3 Next 80B A3B Thinking | 61.7 | — | 60.5 | 67.8 | — | 49 | 69.6 | 262K | — | $0.10 |
| 22 | Qwen3-235B-A22B-Thinking-2507 | 60.9 | — | 58 | 71.9 | — | 46 | 67.8 | 256K | — | $0.30 |
| 23 | o1 | 60.4 | — | — | — | — | 50 | 70.8 | 200K | 66 | $15.00 |
| 24 | GPT-4.5 | 59.2 | — | — | — | — | 50 | 68.4 | 128K | 50 | $75.00 |
| 25 | GPT-4.1 | 58.7 | — | — | — | — | 49.4 | 68 | 1M | 100 | $2.00 |
| 26 | Qwen3-235B-A22B-Instruct-2507 | 57.7 | — | 44 | 71.3 | — | — | — | 131K | 63 | $0.15 |
| 27 | Claude 3.5 Sonnet | 57.6 | — | — | — | — | 46 | 69.2 | 200K | 101 | $3.00 |
| 28 | o4-mini | 57.5 | — | — | — | 51.5 | 49.2 | 71.8 | 200K | 115 | $1.10 |
| 29 | Nova Micro | 56.2 | 56.2 | — | — | — | — | — | 128K | 100 | $0.03 |
| 30 | GLM-4.5 | 55.5 | — | — | — | 26.4 | 60.4 | 79.7 | 131K | 85 | $0.60 |
| 31 | gpt-oss-20b | 54.8 | — | — | — | — | — | 54.8 | 131K | 1000 | $0.03 |
| 32 | GLM 4.5 Air | 53.3 | — | — | — | 21.3 | 60.8 | 77.9 | 131K | 63 | $0.13 |
| 33 | GPT-4o | 53 | — | 45.5 | 63.4 | — | 42.8 | 60.3 | 128K | 132 | $2.50 |
| 34 | Qwen3 Next 80B A3B Instruct | 51.9 | — | 45.5 | 57.3 | — | 44 | 60.9 | 262K | 161 | $0.09 |
| 35 | GPT-4.1 Mini | 45.9 | — | — | — | — | 36 | 55.8 | 1M | 150 | $0.40 |
| 36 | GLM-4.6 | 45.1 | — | — | — | 45.1 | — | — | 203K | 85 | $0.43 |
| 37 | o3-mini | 45 | — | — | — | — | 32.4 | 57.6 | 200K | 115 | $1.10 |
| 38 | Grok 4 Fast | 44.9 | — | — | — | 44.9 | — | — | 2M | 90 | $0.20 |
| 39 | DeepSeek V3.2 Exp | 40.1 | — | — | — | 40.1 | — | — | 164K | 100 | $0.27 |
| 40 | Claude 3.5 Haiku | 36.9 | — | — | — | — | 22.8 | 51 | 200K | 104 | $0.80 |
| 41 | DeepSeek-V3.1 | 30 | — | — | — | 30 | — | — | 164K | — | $0.21 |
| 42 | GPT-4.1 Nano | 18.3 | — | — | — | — | 14 | 22.6 | 1M | 200 | $0.10 |
| 43 | DeepSeek-R1-0528 | 8.9 | — | — | — | 8.9 | — | — | 131K | 45 | $0.55 |
43 models ranked on Agents. The intelligence index is a balanced mean of per-category scores; category columns average the benchmarks within each. Scores are curated approximations — see each model for sources. Click any column to sort.