AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 29, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelGeneral idxMulti-IFLiveBenchArena HardHumanity’s Last ExamIFEvalSimpleQAMMLU-ProMMLUReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
DeepSeek91.119.897.1852025llmOpen weights2025164K1000.70$0.27$0.41
xAI902095852025llmAPI only2M90$0.20$0.50
Google89.837.589.82025multimodalAPI only1M14127.49$2.00$12.00
Anthropic89.528.489.52025llmAPI only200K581.50$5.00$25.00
Google8934.7892025multimodalAPI only1M1911.05$0.50$3.00
#6DeepSeek88.717.792.3852025llmOpen weights671000000000131K450.30$0.55$2.19
Index 46.6 = (73.5 + 49.4 + 40.6 + 22.7 / 4) — equal-weighted mean of 4 components.
General25%
73.5
  • SimpleQA92.3
  • AA-LCR54.7
  • LongBench-v2
  • IFBench
Reasoning25%
49.4
  • GPQA Diamond81
  • Humanity’s Last Exam17.7
  • FrontierMath
  • ARC-AGI-2
Coding25%
40.6
  • SWE-bench Verified44.6
  • Terminal-Bench5.7
  • Aider Polyglot71.6
  • SciCode40.3
Tool use & agents25%
22.7
  • TAU-bench Retail
  • τ²-bench36.5
  • BFCL
  • BrowseComp8.9
DeepSeek88.615.993.483.72025llmOpen weights671B (37B active)2025164K$0.21$0.79
Anthropic88.510.393.283.786.12025llmAPI only200K1010.40$3.00$15.00
Anthropic8811.9882025llmAPI only2025200K1200.40$15.00$75.00
DeepSeek87.535.987.52026llmOpen weights1.6T (49B active)1M301.16$0.44$0.87
Anthropic87.517.387.52025llmAPI only20251M420.40$3.00$15.00
MiniMax87.522.287.52025llmOpen weights205K921.14$0.29$0.95
OpenAI87.435.487.42025llmAPI only400K730.69$1.75$14.00
Anthropic87.311.787.388.82025llmAPI only2025200K1200.40$15.00$75.00
#15OpenAI87.124.887.192.52025llmAPI only2024400K1002.00$1.25$10.00
Index 63.3 = (75.6 + 46.1 + 60.9 + 70.7 / 4) — equal-weighted mean of 4 components.
General25%
75.6
  • SimpleQA
  • AA-LCR75.6
  • LongBench-v2
  • IFBench
Reasoning25%
46.1
  • GPQA Diamond87.3
  • Humanity’s Last Exam24.8
  • FrontierMath26.3
  • ARC-AGI-2
Coding25%
60.9
  • SWE-bench Verified74.9
  • Terminal-Bench37.9
  • Aider Polyglot88
  • SciCode42.9
Tool use & agents25%
70.7
  • TAU-bench Retail
  • τ²-bench86.5
  • BFCL
  • BrowseComp54.9
OpenAI8726.5872025llmAPI only400K1150.77$1.25$10.00
xAI86.64086.62025llmAPI only2024256K1000.70$3.00$15.00
OpenAI86.525.686.52025multimodalAPI only2024400K1806.64$1.25$10.00
DeepSeek86.326.186.32025llmOpen weights164K$0.29$0.43
DeepSeek86.222.286.22025llmOpen weights671B (37B active)131K$0.25$0.38
OpenAI8623.4862025multimodalAPI only400K1884.16$1.25$10.00
NVIDIA868.189.582.52025llmOpen weights2530000000002023420.72$0.60$1.80
Zhipu AI85.625.185.62025llmOpen weights203K980.83$0.40$1.75
xAI85.417.685.42025llm$0.00$0.00
ByteDance85.413.385.42025llm$0.00$0.00

Ranked on General. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.