AI Hub
All benchmarks
Reasoning

ARC-AGI-2

ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks.

10Models
77.1Top score
31.1Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1Gemini 3.1 Pro
77.1
2Claude Opus 4.6
68.8
3Claude Sonnet 4.6
58.3
4GPT-5.2
52.9
5Gemini 3 Deep Think
45.1
6Gemini 3 Pro
31.1
7Grok 4
15.9
8Claude Opus 4
8.6
9o3
6.5
10Gemini 2.5 Pro
4.9

Related Reasoning benchmarks