AI Hub

Benchmarks

The evaluations behind the rankings — what each one measures, and which models lead. Scores feed the per-category indices on the leaderboard.

42
Benchmarks
440
Models scored
2668
Data points
7
Categories

8 benchmarks

Leader
Arena HardGeneral21Qwen3 235B A22B95.6/100
Humanity’s Last ExamGeneral360Grok-4 Heavy50.7/100
IFEvalGeneral41o3-mini93.9/100
LiveBenchGeneral13o3-mini84.6/100
MMLUGeneral92GPT-592.5/100
MMLU-ProGeneral292Gemini 3 Pro89.8/100
Multi-IFGeneral11Qwen3-235B-A22B-Thinking-250780.6/100
SimpleQAGeneral26DeepSeek V3.2 Exp97.1/100