Reasoning

ARC-AGI-2

ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks.

Source

10Models

77.1Top score

31.1Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Gemini 3.1 ProGoogle	77.1
2	Claude Opus 4.6Anthropic	68.8
3	Claude Sonnet 4.6Anthropic	58.3
4	GPT-5.2OpenAI	52.9
5	Gemini 3 Deep ThinkGoogle	45.1
6	Gemini 3 ProGoogle	31.1
7	Grok 4xAI	15.9
8	Claude Opus 4Anthropic	8.6
9	o3OpenAI	6.5
10	Gemini 2.5 ProGoogle	4.9

Related Reasoning benchmarks

GPQA Diamond405 BIG-Bench Hard28 DROP25