Multimodal

AI2D

AI2D is a dataset of 4,903 illustrative diagrams from grade school natural sciences (such as food webs, human physiology, and life cycles) with over 15,000 multiple choice questions and answers.

Source

17Models

94.7Top score

84.5Median

AI2D is a dataset of 4,903 illustrative diagrams from grade school natural sciences (such as food webs, human physiology, and life cycles) with over 15,000 multiple choice questions and answers. The benchmark evaluates diagram understanding and visual reasoning capabilities, requiring models to interpret diagrammatic elements, relationships, and structure to answer questions about scientific concepts represented in visual form.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Claude 3.5 SonnetAnthropic	94.7
2	GPT-4oOpenAI	94.2
3	Pixtral LargeMistral AI	93.8
4	Mistral Small 3.2 24B InstructMistral AI	92.9
5	Llama 3.2 90B InstructMeta	92.3
6	Llama 3.2 11B InstructMeta	91.1
7	Qwen2.5 VL 72B InstructAlibaba	88.4
8	Grok-1.5VxAI	88.3
9	Gemma 3 27BGoogle	84.5
10	Gemma 3 12BGoogle	84.2
11	Qwen2.5-Omni-7BAlibaba	83.2
12	Phi-4-multimodal-instructMicrosoft	82.3
13	DeepSeek VL2DeepSeek	81.4
14	DeepSeek VL2 SmallDeepSeek	80
15	Phi-3.5-vision-instructMicrosoft	78.1
16	Gemma 3 4BGoogle	74.8
17	DeepSeek VL2 TinyDeepSeek	71.6

Related Multimodal benchmarks

MMMU52 MathVista34 DocVQA26 ChartQA24 MMMU-Pro13