AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
InclusionAI34.445.743.821.826.389.345.777.436.76.864.389.326.310.280.645.72025llm$0.00$0.00
InclusionAI32.534.739.622.932.771.334.771.935.210.667.771.332.77.282.234.72025llm$0.00$0.00
AI21 Labs11.37193.415.810.7733.35.90.82110.715.84.657.772025llm$0.00$0.00
Liquid AI8.4019.73.410.525.3034.46.8015.125.310.54.950.502025llm$0.00$0.00
Alibaba24.823.73818.51972.323.769.530.86.147.672.3196.476.423.72025multimodalOpen weights2025262K1230.98$0.13$0.52
Alibaba29.540.740.417.119.982.340.77228.85.369.782.319.98.780.740.72025llm1221.14$0.20$0.80
Zhipu AI53.454.349.1496193.954.38138.440.569.56893.976.945.117.282.954.32025llmOpen weights357B (MoE)2025203K850.70$0.43$1.74
ServiceNow38.22041.722.768.487.52071.334.810.672.887.568.41277.3202025llm$0.00$0.00
Anthropic63.965.750.457.382.28765.783.444.75071.477.28778.186.27017.387.565.72025llmAPI only20251M420.40$3.00$15.00
#160DeepSeek56.383.149.9553786.46979.939.937.774.167.874.589.383.633.940.119.88597.1692025llmOpen weights2025164K1000.70$0.27$0.41
Index 56.3 = (83.1 + 49.9 + 55.0 + 37.0 / 4) — equal-weighted mean of 4 components.
General25%
83.1
  • SimpleQA97.1
  • AA-LCR69
  • LongBench-v2
  • IFBench
Reasoning25%
49.9
  • GPQA Diamond79.9
  • Humanity’s Last Exam19.8
  • FrontierMath
  • ARC-AGI-2
Coding25%
55
  • SWE-bench Verified67.8
  • Terminal-Bench37.7
  • Aider Polyglot74.5
  • SciCode39.9
Tool use & agents25%
37
  • TAU-bench Retail
  • τ²-bench33.9
  • BFCL
  • BrowseComp40.1
Google41.944.347.843.831.688.179.761.782.839.413.671.360.461.956.778.398.18831.679.712.784.226.961.72025multimodalAPI only20251M850.70$0.30$2.50
OpenAI65.46954.751.186.898.76983.740.937.98474.598.786.825.686.5692025multimodalAPI only2024400K1806.64$1.25$10.00
Alibaba48.646.743.829.474.380.746.776.438.320.576.780.774.311.184.146.72025llmAPI only2025262K451.71$0.78$3.90
Alibaba45.658.743.725.754.188.358.777.239.911.464.688.354.110.183.658.72025llm341.75$0.80$6.20
Alibaba31.731.738.821.335.170.731.771.235.96.859.470.735.16.382.331.72025multimodalOpen weights2025262K511.20$0.20$0.88
Liquid AI8.3017.91.713.58.3030.62.50.88.18.313.55.229.802025llm$0.00$0.00
DeepSeek46.46547.236.237.189.76579.240.631.879.889.737.115.285.1652025llmOpen weights2025164K$0.27$0.95
Alibaba19.6039.917.221.374072.630.63.867.97421.37.379.202025llm1021.05$0.30$1.00
IBM15.2922.711.617.313.7941.620.92.325.113.717.33.762.492025llm5248.71$0.10$0.30
Alibaba15033.610.116.452.306218.61.542.252.316.45.172.502025llm1031.04$0.30$1.00
xAI5579.952.931.655.492.764.785.744.218.9809293.365.844.920859564.72025llmAPI only2M90$0.20$0.50
InclusionAI18.52140.712.2083.72172.516.87.662.883.708.979.3212025llm$0.10$0.60
Mistral AI42.851.341.826.1528251.373.939.212.97582529.681.551.32025llm420.50$2.00$5.00
Mistral AI25.116.336.219.927.880.316.366.335.24.572.380.327.86.176.816.32025llm1060.38$0.50$1.50
InclusionAI22.9153619.820.865.31565.728.910.658.965.320.86.377.7152025llm911.61$0.10$0.60

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.