AI2D
AI2D is a dataset of 4,903 illustrative diagrams from grade school natural sciences (such as food webs, human physiology, and life cycles) with over 15,000 multiple choice questions and answers.
17Models
94.7Top score
84.5Median
AI2D is a dataset of 4,903 illustrative diagrams from grade school natural sciences (such as food webs, human physiology, and life cycles) with over 15,000 multiple choice questions and answers. The benchmark evaluates diagram understanding and visual reasoning capabilities, requiring models to interpret diagrammatic elements, relationships, and structure to answer questions about scientific concepts represented in visual form.
State of the art over time
Each point is a model at its release date; the line traces the best score to date.
Ranking
| 1 | Claude 3.5 Sonnet | 94.7 |
| 2 | GPT-4o | 94.2 |
| 3 | Pixtral Large | 93.8 |
| 4 | Mistral Small 3.2 24B Instruct | 92.9 |
| 5 | Llama 3.2 90B Instruct | 92.3 |
| 6 | Llama 3.2 11B Instruct | 91.1 |
| 7 | Qwen2.5 VL 72B Instruct | 88.4 |
| 8 | Grok-1.5V | 88.3 |
| 9 | Gemma 3 27B | 84.5 |
| 10 | Gemma 3 12B | 84.2 |
| 11 | Qwen2.5-Omni-7B | 83.2 |
| 12 | Phi-4-multimodal-instruct | 82.3 |
| 13 | DeepSeek VL2 | 81.4 |
| 14 | DeepSeek VL2 Small | 80 |
| 15 | Phi-3.5-vision-instruct | 78.1 |
| 16 | Gemma 3 4B | 74.8 |
| 17 | DeepSeek VL2 Tiny | 71.6 |