AI Hub

Benchmarks

The evaluations behind the rankings — what each one measures, and which models lead. Scores feed the per-category indices on the leaderboard.

42
Benchmarks
440
Models scored
2668
Data points
7
Categories

6 benchmarks

Leader
AI2DMultimodal17Claude 3.5 Sonnet94.7/100
ChartQAMultimodal24Claude 3.5 Sonnet90.8/100
DocVQAMultimodal26Qwen2.5 VL 72B Instruct96.4/100
MathVistaMultimodal34o386.8/100
MMMUMultimodal52GPT-584.2/100
MMMU-ProMultimodal13GPT-578.4/100