Fastest AI models

The fastest models by output throughput (tokens per second), limited to models with a measured intelligence index so you are comparing genuinely capable options.

1Llama 3.3 70B InstructOverall 50.62220 tok/sOutput speed
2Llama 3.1 8B InstructOverall 452047 tok/sOutput speed
3Llama 3.1 70B InstructOverall 561204 tok/sOutput speed
4gpt-oss-20bOverall 73.61000 tok/sOutput speed
5Mercury 2Overall 77790 tok/sOutput speed
6Llama 4 ScoutOverall 58.9776 tok/sOutput speed
7Llama 4 MaverickOverall 63.9639 tok/sOutput speed
8Granite 4.0 H SmallOverall 35.7524 tok/sOutput speed
9gpt-oss-120bOverall 79.6500 tok/sOutput speed
10GPT-5 nanoOverall 71.2500 tok/sOutput speed
11Granite 3.3 8BOverall 32.5376 tok/sOutput speed
12Gemini 3.1 Flash LiteOverall 82.2342 tok/sOutput speed
13Qwen3.5 2BOverall 45.6328 tok/sOutput speed
14Qwen3 32BOverall 73.8328 tok/sOutput speed
15Nemotron 3 Nano Omni 30B A3B ReasoningOverall 46.9301 tok/sOutput speed

Rankings are computed from live benchmark data and update automatically. See the full intelligence leaderboard for the overall picture.

Track the leaders as they change

A free daily email: new models, leaderboard movers, and who leads each category.

More rankings

AI models for reasoning AI models for coding AI models for math AI models for agents AI models for multimodal open-weights AI models value AI models