AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelCoding idxSciCodeAider Polyglot EditMultiPL-EMBPPSWE-bench ProAider PolyglotLiveCodeBenchTerminal-BenchSWE-bench VerifiedHumanEvalReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
DeepSeek67.65093.546.280.62026llmOpen weights1.6T (49B active)1M301.16$0.44$0.87
OpenAI67.152.189.447802025llmAPI only400K730.69$1.75$14.00
Google66.456.191.741.776.22025multimodalAPI only1M14127.49$2.00$12.00
Anthropic66.149.587.14780.92025llmAPI only200K581.50$5.00$25.00
OpenAI65.742.98884.637.974.993.42025llmAPI only2024400K1002.00$1.25$10.00
Anthropic65.554.554.587.62026llmAPI only1M491.42$5.00$25.00
Google64.550.690.838.6782025multimodalAPI only1M1911.05$0.50$3.00
Google64.458.953.880.62026multimodalAPI only1M14226.02$2.00$12.00
#9OpenAI61.94181.380.837.169.12025llmAPI only2024200K5020.00$2.00$8.00
Index 56.3 = (69.3 + 33.6 + 57.1 + 65.2 / 4) — equal-weighted mean of 4 components.
General25%
69.3
  • SimpleQA
  • AA-LCR69.3
  • LongBench-v2
  • IFBench
Reasoning25%
33.6
  • GPQA Diamond87.7
  • Humanity’s Last Exam24.3
  • FrontierMath15.8
  • ARC-AGI-26.5
Coding25%
57.1
  • SWE-bench Verified69.1
  • Terminal-Bench37.1
  • Aider Polyglot81.3
  • SciCode41
Tool use & agents25%
65.2
  • TAU-bench Retail
  • τ²-bench80.7
  • BFCL
  • BrowseComp49.7
Anthropic60.844.771.45077.22025llmAPI only20251M420.40$3.00$15.00
xAI60.645.779.67937.92025llmAPI only2024256K1000.70$3.00$15.00
Anthropic60.451.948.580.8952026llmAPI only1M481.65$5.00$25.00
Google60.442.872.776.580.126.563.82025multimodalAPI only20251M850.70$1.25$10.00
Anthropic59.846.95379.62026llmAPI only1M751.13$3.00$15.00
OpenAI59.340.98437.974.52025multimodalAPI only2024400K1806.64$1.25$10.00
DeepSeek58.839.974.574.137.767.82025llmOpen weights2025164K1000.70$0.27$0.41
MiniMax58.636.182.646.369.42025llmOpen weights230B (10B active)205K911.19$0.26$1.00
#18OpenAI58.543.386.845.52025llmAPI only400K1150.77$1.25$10.00
Index 64.7 = (75.0 + 57.3 + 44.4 + 81.9 / 4) — equal-weighted mean of 4 components.
General25%
75
  • SimpleQA
  • AA-LCR75
  • LongBench-v2
  • IFBench
Reasoning25%
57.3
  • GPQA Diamond88.1
  • Humanity’s Last Exam26.5
  • FrontierMath
  • ARC-AGI-2
Coding25%
44.4
  • SWE-bench Verified
  • Terminal-Bench45.5
  • Aider Polyglot
  • SciCode43.3
Tool use & agents25%
81.9
  • TAU-bench Retail
  • τ²-bench81.9
  • BFCL
  • BrowseComp
OpenAI58.456.158.660.62026llmAPI only20251.1M670.97$5.00$30.00
DeepSeek57.738.970.286.235.62025llmOpen weights671B (37B active)131K$0.25$0.38
Anthropic57.640.97263.639.272.52025llmAPI only2025200K1200.40$15.00$75.00
Moonshot AI57.542.485.331.171.32025llmOpen weights1T (32B active)262K1001.00$0.60$2.50
OpenAI57.356.657.757.62026llmAPI only1.1M840.63$2.50$15.00
OpenAI57.146.558.268.985.915.268.12025multimodalAPI only2024200K1155.20$1.10$4.40
DeepSeek56.14489.634.82025llmOpen weights164K$0.29$0.43

Ranked on Coding. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.