Multimodal

ChartQA

ChartQA is a large-scale benchmark comprising 9.6K human-written questions and 23.1K questions generated from human-written chart summaries, designed to evaluate models' abilities

Source

24Models

90.8Top score

85.5Median

ChartQA is a large-scale benchmark comprising 9.6K human-written questions and 23.1K questions generated from human-written chart summaries, designed to evaluate models' abilities in visual and logical reasoning over charts.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Claude 3.5 SonnetAnthropic	90.8
2	Llama 4 MaverickMeta	90
3	Qwen2.5 VL 72B InstructAlibaba	89.5
4	Nova ProAmazon	89.2
5	Llama 4 ScoutMeta	88.8
6	Qwen2-VL-72B-InstructAlibaba	88.3
7	Pixtral LargeMistral AI	88.1
8	Mistral Small 3.2 24B InstructMistral AI	87.4
9	Qwen2.5 VL 7B InstructAlibaba	87.3
10	Nova LiteAmazon	86.8
11	DeepSeek VL2DeepSeek	86
12	GPT-4oOpenAI	85.7
13	Llama 3.2 90B InstructMeta	85.5
14	Qwen2.5-Omni-7BAlibaba	85.3
15	DeepSeek VL2 SmallDeepSeek	84.5
16	Llama 3.2 11B InstructMeta	83.4
17	Pixtral-12BMistral AI	81.8
18	Phi-3.5-vision-instructMicrosoft	81.8
19	Phi-4-multimodal-instructMicrosoft	81.4
20	DeepSeek VL2 TinyDeepSeek	81
21	Gemma 3 27BGoogle	78
22	Grok-1.5VxAI	76.1
23	Gemma 3 12BGoogle	75.7
24	Gemma 3 4BGoogle	68.8

Related Multimodal benchmarks

MMMU52 MathVista34 DocVQA26 AI2D17 MMMU-Pro13