AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelAgents idxτ²-benchBFCLτ²-bench Airlineτ²-bench RetailBrowseCompTAU-bench AirlineTAU-bench RetailReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
China Mobile99.199.12026llm$0.00$0.00
Zhipu AI98.898.82026llmOpen weights203K1131.00$0.06$0.40
Zhipu AI98.598.52026llmAPI only203K$1.20$4.00
Zhipu AI98.598.52026multimodalAPI only203K$1.20$4.00
Zhipu AI98.298.22026llmOpen weights744B (44B active)203K670.77$0.60$1.92
xAI97.797.72026llmAPI only1M880.52$1.25$2.50
Alibaba97.797.72026multimodalAPI only1M521.73$0.33$1.95
Zhipu AI97.797.72026llmOpen weights203K530.78$0.98$3.08
xAI96.596.52026llm970.62$2.00$6.00
DeepSeek96.296.22026llmOpen weights1.6T (49B active)1M301.16$0.44$0.87
Moonshot AI95.995.92026llmOpen weights1T (32B active)262K571.20$0.68$3.42
Alibaba95.995.92026llmAPI only262K362.79$1.04$6.24
Moonshot AI95.995.92026multimodalOpen weights1T (32B active)262K351.33$0.40$1.90
Zhipu AI95.995.92025llmOpen weights203K980.83$0.40$1.75
Google95.695.62026multimodalAPI only1M14226.02$2.00$12.00
Google95.695.62026multimodalAPI only20251M2219.75$1.50$9.00
Alibaba95.695.62026multimodalOpen weights262K531.82$0.39$2.34
DeepSeek95.695.62026llmOpen weights284B (13B active)1M1090.76$0.10$0.20
MiniMax95.395.32026llmOpen weights205K871.16$0.15$1.15
Alibaba95.395.32026multimodalOpen weights262K1691.47$0.14$1.00
Xiaomi95952026llmAPI only1M602.01$1.00$3.00
#22Xiaomi95952025llmOpen weights262K1451.34$0.10$0.30
Index 61.9 = (64.3 + 52.9 + 35.3 + 95.0 / 4) — equal-weighted mean of 4 components.
General25%
64.3
  • SimpleQA
  • AA-LCR64.3
  • LongBench-v2
  • IFBench
Reasoning25%
52.9
  • GPQA Diamond84.6
  • Humanity’s Last Exam21.1
  • FrontierMath
  • ARC-AGI-2
Coding25%
35.3
  • SWE-bench Verified
  • Terminal-Bench31.1
  • Aider Polyglot
  • SciCode39.4
Tool use & agents25%
95
  • TAU-bench Retail
  • τ²-bench95
  • BFCL
  • BrowseComp
#23Alibaba94.794.72026llmAPI only1M2031.59$1.25$3.75
Index 69.7 = (69.0 + 65.2 + 49.8 + 94.7 / 4) — equal-weighted mean of 4 components.
General25%
69
  • SimpleQA
  • AA-LCR69
  • LongBench-v2
  • IFBench
Reasoning25%
65.2
  • GPQA Diamond92.3
  • Humanity’s Last Exam38.1
  • FrontierMath
  • ARC-AGI-2
Coding25%
49.8
  • SWE-bench Verified
  • Terminal-Bench50.8
  • Aider Polyglot
  • SciCode48.8
Tool use & agents25%
94.7
  • TAU-bench Retail
  • τ²-bench94.7
  • BFCL
  • BrowseComp
Anthropic94.494.42026llmAPI only1M666.54$5.00$25.00
StepFun94.494.42026llmOpen weights262K1940.85$0.09$0.30

Ranked on Agents. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.