Multimodal

MMMU-Pro

A more robust multi-discipline multimodal understanding benchmark that enhances MMMU through a three-step process: filtering text-only answerable questions, augmenting candidate op

Source

13Models

78.4Top score

49.5Median

A more robust multi-discipline multimodal understanding benchmark that enhances MMMU through a three-step process: filtering text-only answerable questions, augmenting candidate options, and introducing vision-only input settings. Achieves significantly lower model performance (16.8-26.9%) compared to original MMMU, providing more rigorous evaluation that closely mimics real-world scenarios.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	GPT-5OpenAI	78.4
2	Claude Opus 4.6Anthropic	77.3
3	o3OpenAI	76.4
4	GPT-4oOpenAI	59.9
5	Llama 4 MaverickMeta	59.6
6	Qwen2.5 VL 72B InstructAlibaba	51.1
7	Qwen2.5 VL 32B InstructAlibaba	49.5
8	Qwen2-VL-72B-InstructAlibaba	46.2
9	Llama 3.2 90B InstructMeta	45.2
10	Phi-4-multimodal-instructMicrosoft	38.5
11	Qwen2.5 VL 7B InstructAlibaba	38.3
12	Qwen2.5-Omni-7BAlibaba	36.6
13	Llama 3.2 11B InstructMeta	33

Related Multimodal benchmarks

MMMU52 MathVista34 DocVQA26 ChartQA24 AI2D17