Multimodal

MMMU

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning.

Source

52Models

84.2Top score

65.9Median

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	GPT-5OpenAI	84.2
2	o3OpenAI	82.9
3	Gemini 2.5 Pro Preview 06-05Google	82
4	o4-miniOpenAI	81.6
5	Gemini 2.5 FlashGoogle	79.7
6	Gemini 2.5 ProGoogle	79.6
7	Grok-3xAI	78
8	o1OpenAI	77.6
9	Gemini 2.0 Flash ThinkingGoogle	75.4
10	GPT-4.5OpenAI	75.2
11	Claude 3.7 SonnetAnthropic	75
12	GPT-4.1OpenAI	74.8
13	Claude Sonnet 4Anthropic	74.4
14	Llama 4 MaverickMeta	73.4
15	Gemini 2.5 Flash LiteGoogle	72.9
16	GPT-4.1 MiniOpenAI	72.7
17	GPT-4oOpenAI	72.2
18	Gemini 2.0 FlashGoogle	70.7
19	QvQ-72B-PreviewAlibaba	70.3
20	Qwen2.5 VL 72B InstructAlibaba	70.2
21	Qwen2.5 VL 32B InstructAlibaba	70
22	Kimi-k1.5Moonshot AI	70
23	Llama 4 ScoutMeta	69.4
24	Claude 3.5 SonnetAnthropic	68.3
25	Gemini 2.0 Flash LiteGoogle	68
26	Grok-2xAI	66.1
27	Gemini 1.5 ProGoogle	65.9
28	Pixtral LargeMistral AI	64
29	Grok-2 minixAI	63.2
30	Mistral Small 3.2 24B InstructMistral AI	62.5
31	Gemini 1.5 FlashGoogle	62.3
32	Nova ProAmazon	61.7
33	Llama 3.2 90B InstructMeta	60.3
34	GPT-4o-miniOpenAI	59.4
35	Mistral Small 3.1 24B InstructMistral AI	59.3
36	Mistral Small 3.1 24B BaseMistral AI	59.3
37	Qwen2.5-Omni-7BAlibaba	59.2
38	Qwen2.5 VL 7B InstructAlibaba	58.6
39	Nova LiteAmazon	56.2
40	GPT-4.1 NanoOpenAI	55.4
41	Phi-4-multimodal-instructMicrosoft	55.1
42	Gemini 1.5 Flash 8BGoogle	53.7
43	Grok-1.5VxAI	53.6
44	Grok-1.5xAI	53.6
45	Pixtral-12BMistral AI	52.5
46	DeepSeek VL2DeepSeek	51.1
47	Llama 3.2 11B InstructMeta	50.7
48	DeepSeek VL2 SmallDeepSeek	48
49	Gemini 1.0 ProGoogle	47.9
50	Phi-3.5-vision-instructMicrosoft	43
51	DeepSeek VL2 TinyDeepSeek	40.7
52	GPT-3.5 TurboOpenAI	0

Related Multimodal benchmarks

MathVista34 DocVQA26 ChartQA24 AI2D17 MMMU-Pro13