ARC-AGI-2
ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks.
10Models
77.1Top score
31.1Median
State of the art over time
Each point is a model at its release date; the line traces the best score to date.
Ranking
| 1 | Gemini 3.1 Pro | 77.1 |
| 2 | Claude Opus 4.6 | 68.8 |
| 3 | Claude Sonnet 4.6 | 58.3 |
| 4 | GPT-5.2 | 52.9 |
| 5 | Gemini 3 Deep Think | 45.1 |
| 6 | Gemini 3 Pro | 31.1 |
| 7 | Grok 4 | 15.9 |
| 8 | Claude Opus 4 | 8.6 |
| 9 | o3 | 6.5 |
| 10 | Gemini 2.5 Pro | 4.9 |