Math

MGSM

MGSM (Multilingual Grade School Math) is a benchmark of grade-school math problems.

29Models

92.3Top score

83.5Median

MGSM (Multilingual Grade School Math) is a benchmark of grade-school math problems. Contains 250 grade-school math problems manually translated from the GSM8K dataset into ten typologically diverse languages: Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, and Telugu. Evaluates multilingual mathematical reasoning capabilities.

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1	Llama 4 MaverickMeta	92.3
2	o3-miniOpenAI	92
3	Claude 3.5 SonnetAnthropic	91.6
4	Llama 3.3 70B InstructMeta	91.1
5	o1-previewOpenAI	90.8
6	Claude 3 OpusAnthropic	90.7
7	Llama 4 ScoutMeta	90.6
8	o1OpenAI	89.3
9	GPT-4 TurboOpenAI	88.5
10	Gemini 1.5 ProGoogle	87.5
11	GPT-4o-miniOpenAI	87
12	Llama 3.2 90B InstructMeta	86.9
13	Claude 3.5 HaikuAnthropic	85.6
14	Claude 3 SonnetAnthropic	83.5
15	Qwen3 235B A22BAlibaba	83.5
16	Gemini 1.5 FlashGoogle	82.6
17	Phi 4Microsoft	80.6
18	Claude 3 HaikuAnthropic	75.1
19	GPT-4OpenAI	74.5
20	Llama 3.2 11B InstructMeta	68.9
21	Gemma 3n E4B InstructedGoogle	67
22	Phi 4 MiniMicrosoft	63.9
23	Gemma 3n E4B Instructed LiteRT PreviewGoogle	60.7
24	Phi-3.5-MoE-instructMicrosoft	58.7
25	Llama 3.2 3B InstructMeta	58.2
26	GPT-3.5 TurboOpenAI	56.3
27	Gemma 3n E2B Instructed LiteRT (Preview)Google	53.1
28	Gemma 3n E2B InstructedGoogle	53.1
29	Phi-3.5-mini-instructMicrosoft	47.9

Related Math benchmarks

AIME 2025221 MATH-500169 MATH67 AIME 202446 GSM8K45 HMMT 202511