AI Hub

Benchmarks

The evaluations behind the rankings — what each one measures, and which models lead. Scores feed the per-category indices on the leaderboard.

42
Benchmarks
440
Models scored
2668
Data points
7
Categories

4 benchmarks

Leader
ARC-AGI-2Reasoning10Gemini 3.1 Pro77.1/100
BIG-Bench HardReasoning28Claude 3.5 Sonnet93.1/100
DROPReasoning25DeepSeek-V391.6/100
GPQA DiamondReasoning405Gemini 3.1 Pro94.3/100