Leaderboards
Model rankings
A balanced intelligence index averages each model's per-category scores. Drill into a category for individual benchmarks, or sort by speed, price, and context. See what changed → How this is calculated → Embed this leaderboard →
Updated May 25, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Report an error
Price vs. intelligence
Intelligence index vs. input price — up and to the left is better value.
Speed vs. intelligence
Intelligence index vs. output speed — up and to the right is fast and smart.
| # | Model | Multi idx ↓ | AI2D | MMMU-Pro | ChartQA | DocVQA | MathVista | MMMU | Context | Speed | In $/M |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet | 83.3 | 94.7 | — | 90.8 | 95.2 | 67.7 | 68.3 | 200K | 101 | $3.00 |
| 2 | Gemma 3 27B | 83 | 84.5 | — | 78 | 86.6 | — | — | 131K | 33 | $0.08 |
| 3 | o4-mini | 82.9 | — | — | — | — | 84.3 | 81.6 | 200K | 115 | $1.10 |
| 4 | Gemma 3 12B | 82.3 | 84.2 | — | 75.7 | 87.1 | — | — | 131K | 33 | $0.04 |
| 5 | Gemini 2.5 Pro Preview 06-05 | 82 | — | — | — | — | — | 82 | 1M | 85 | $1.25 |
| 6 | o3 | 82 | — | 76.4 | — | — | 86.8 | 82.9 | 200K | 50 | $2.00 |
| 7 | Pixtral Large | 81.7 | 93.8 | — | 88.1 | 93.3 | 69.4 | 64 | 131K | 0 | $2.00 |
| 8 | Nova Pro | 81.5 | — | — | 89.2 | 93.5 | — | 61.7 | 300K | 100 | $0.80 |
| 9 | GPT-5 | 81.3 | — | 78.4 | — | — | — | 84.2 | 400K | 100 | $1.25 |
| 10 | Mistral Small 3.2 24B Instruct | 81 | 92.9 | — | 87.4 | 94.9 | 67.1 | 62.5 | — | — | — |
| 11 | Llama 4 Scout | 80.8 | — | — | 88.8 | 94.4 | 70.7 | 69.4 | 10M | 776 | $0.08 |
| 12 | Gemini 2.5 Flash | 79.7 | — | — | — | — | — | 79.7 | 1M | 85 | $0.30 |
| 13 | Gemini 2.5 Pro | 79.6 | — | — | — | — | — | 79.6 | 1M | 85 | $1.25 |
| 14 | Qwen2.5 VL 72B Instruct | 79.1 | 88.4 | 51.1 | 89.5 | 96.4 | — | 70.2 | 131K | — | $0.25 |
| 15 | Nova Lite | 78.5 | — | — | 86.8 | 92.4 | — | 56.2 | 300K | 100 | $0.06 |
| 16 | Llama 4 Maverick | 78.2 | — | 59.6 | 90 | 94.4 | 73.7 | 73.4 | 1M | 639 | $0.15 |
| 17 | Grok-3 | 78 | — | — | — | — | — | 78 | 128K | 100 | $3.00 |
| 18 | GPT-4o | 77.7 | 94.2 | 59.9 | 85.7 | 92.8 | 61.4 | 72.2 | 128K | 132 | $2.50 |
| 19 | Claude Opus 4.6 | 77.3 | — | 77.3 | — | — | — | — | 1M | 48 | $5.00 |
| 20 | Grok-2 | 76.2 | — | — | — | 93.6 | 69 | 66.1 | 128K | 85 | $2.00 |
| 21 | Gemini 2.0 Flash Thinking | 75.4 | — | — | — | — | — | 75.4 | — | — | $0.00 |
| 22 | Claude 3.7 Sonnet | 75 | — | — | — | — | — | 75 | 200K | 101 | $3.00 |
| 23 | DeepSeek VL2 | 74.9 | 81.4 | — | 86 | 93.3 | 62.8 | 51.1 | 129K | 22 | $9.50 |
| 24 | Grok-2 mini | 74.8 | — | — | — | 93.2 | 68.1 | 63.2 | — | — | — |
| 25 | o1 | 74.7 | — | — | — | — | 71.8 | 77.6 | 200K | 66 | $15.00 |
| 26 | Claude Sonnet 4 | 74.4 | — | — | — | — | — | 74.4 | 1M | 101 | $3.00 |
| 27 | GPT-4.5 | 73.8 | — | — | — | — | 72.3 | 75.2 | 128K | 50 | $75.00 |
| 28 | GPT-4.1 | 73.5 | — | — | — | — | 72.2 | 74.8 | 1M | 100 | $2.00 |
| 29 | DeepSeek VL2 Small | 73.1 | 80 | — | 84.5 | 92.3 | 60.7 | 48 | — | — | — |
| 30 | Gemma 3 4B | 73.1 | 74.8 | — | 68.8 | 75.8 | — | — | 131K | 33 | $0.04 |
| 31 | GPT-4.1 Mini | 72.9 | — | — | — | — | 73.1 | 72.7 | 1M | 150 | $0.40 |
| 32 | Gemini 2.5 Flash Lite | 72.9 | — | — | — | — | — | 72.9 | 1M | 6 | $0.10 |
| 33 | Kimi-k1.5 | 72.5 | — | — | — | — | 74.9 | 70 | — | — | — |
| 34 | Llama 3.2 90B Instruct | 71.8 | 92.3 | 45.2 | 85.5 | 90.1 | 57.3 | 60.3 | 128K | 100 | $0.35 |
| 35 | Qwen2.5 VL 32B Instruct | 71.4 | — | 49.5 | — | 94.8 | — | 70 | — | — | — |
| 36 | Grok-1.5V | 71.3 | 88.3 | — | 76.1 | 85.6 | 52.8 | 53.6 | — | — | — |
| 37 | Qwen2.5-Omni-7B | 71.2 | 83.2 | 36.6 | 85.3 | 95.2 | 67.9 | 59.2 | — | — | — |
| 38 | QvQ-72B-Preview | 70.9 | — | — | — | — | 71.4 | 70.3 | — | — | — |
| 39 | Pixtral-12B | 70.8 | — | — | 81.8 | 90.7 | 58 | 52.5 | 128K | 0 | $0.15 |
| 40 | Gemini 2.0 Flash | 70.7 | — | — | — | — | — | 70.7 | 1M | 183 | $0.10 |
| 41 | Qwen2.5 VL 7B Instruct | 70 | — | 38.3 | 87.3 | 95.7 | — | 58.6 | — | — | — |
| 42 | Phi-4-multimodal-instruct | 68.8 | 82.3 | 38.5 | 81.4 | 93.2 | 62.4 | 55.1 | 128K | 25 | $0.05 |
| 43 | Gemini 2.0 Flash Lite | 68 | — | — | — | — | — | 68 | 1M | 85 | $0.08 |
| 44 | Qwen2-VL-72B-Instruct | 67.3 | — | 46.2 | 88.3 | — | — | — | — | — | — |
| 45 | DeepSeek VL2 Tiny | 67.2 | 71.6 | — | 81 | 88.9 | 53.6 | 40.7 | — | — | — |
| 46 | Gemini 1.5 Pro | 67 | — | — | — | — | 68.1 | 65.9 | 2M | 85 | $1.25 |
| 47 | Llama 3.2 11B Instruct | 66.4 | 91.1 | 33 | 83.4 | 88.4 | 51.5 | 50.7 | 128K | 168 | $0.05 |
| 48 | Gemini 1.5 Flash | 64.1 | — | — | — | — | 65.8 | 62.3 | 1M | 150 | $0.15 |
| 49 | Grok-1.5 | 64 | — | — | — | 85.6 | 52.8 | 53.6 | — | — | — |
| 50 | Phi-3.5-vision-instruct | 61.7 | 78.1 | — | 81.8 | — | 43.9 | 43 | — | — | — |
| 51 | Mistral Small 3.1 24B Base | 59.3 | — | — | — | — | — | 59.3 | 128K | 137 | $0.10 |
| 52 | Mistral Small 3.1 24B Instruct | 59.3 | — | — | — | — | — | 59.3 | — | — | — |
| 53 | GPT-4o-mini | 58.1 | — | — | — | — | 56.7 | 59.4 | 128K | 92 | $0.15 |
| 54 | GPT-4.1 Nano | 55.8 | — | — | — | — | 56.2 | 55.4 | 1M | 200 | $0.10 |
| 55 | Gemini 1.5 Flash 8B | 54.2 | — | — | — | — | 54.7 | 53.7 | 1M | 150 | $0.07 |
| 56 | Gemini 1.0 Pro | 47.3 | — | — | — | — | 46.6 | 47.9 | 33K | 120 | $0.50 |
| 57 | GPT-3.5 Turbo | 0 | — | — | — | — | 0 | 0 | 16K | 100 | $0.50 |
57 models ranked on Multimodal. The intelligence index is a balanced mean of per-category scores; category columns average the benchmarks within each. Scores are curated approximations — see each model for sources. Click any column to sort.