AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 29, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
Anthropic71.767.768.955.994.467.79253.558.394.445.767.72026llmAPI only1M666.54$5.00$25.00
OpenBMB25.94.715.80.782.54.726.91.4082.54.64.72026llm$0.00$0.00
Alibaba69.76965.249.894.76992.348.850.894.738.1692026llmAPI only1M2031.59$1.25$3.75
Google70.77166.649.795.67192.253.146.295.641712026multimodalAPI only20251M2219.75$1.50$9.00
China Mobile5755.344.52999.155.382.929.128.899.16.155.32026llm$0.00$0.00
OpenBMB28.26.317.71.187.76.330.52.1087.74.96.32026llm$0.00$0.00
InclusionAI61.164.35235.692.464.385.742.428.892.418.364.32026llmAPI only262K1201.88$0.08$0.63
#8Google44.765.349.233.131.365.382.241.924.231.316.265.32026multimodalAPI only1M3425.35$0.25$1.50
Index 44.7 = (65.3 + 49.2 + 33.1 + 31.3 / 4) — equal-weighted mean of 4 components.
General25%
65.3
  • SimpleQA
  • AA-LCR65.3
  • LongBench-v2
  • IFBench
Reasoning25%
49.2
  • GPQA Diamond82.2
  • Humanity’s Last Exam16.2
  • FrontierMath
  • ARC-AGI-2
Coding25%
33.1
  • SWE-bench Verified
  • Terminal-Bench24.2
  • Aider Polyglot
  • SciCode41.9
Tool use & agents25%
31.3
  • TAU-bench Retail
  • τ²-bench31.3
  • BFCL
  • BrowseComp
xAI676562.642.697.76590.147.337.997.735652026llmAPI only1M880.52$1.25$2.50
OpenAI5155.752.546.349.455.784.650.342.449.420.355.72026llm$5.00$30.00
Mistral AI58.96143.836.594.26174.839.633.394.212.8612026multimodalAPI only262K1400.58$1.50$7.50
IBM18.61223.510.927.81243.321.8027.83.8122026llmOpen weights131K1330.47$0.05$0.10
NVIDIA31.335.726.118.145.335.746.927.88.345.35.335.72026llm3010.58$0.10$0.30
IBM25.318.726.214.142.118.748.125.82.342.14.218.72026llm$0.00$0.00
IBM11.8317.47.119.6331.411.92.319.63.432026llm$0.00$0.00
Alibaba67.569.758.945.495.969.788.846.943.995.928.969.72026llmAPI only262K362.79$1.04$6.24
Alibaba63.368.752.937.394.268.784.239.834.894.221.668.72026multimodalOpen weights262K641.40$0.29$3.20
Alibaba61.663.752.235.395.363.784.135.834.895.320.263.72026multimodalOpen weights262K1691.47$0.14$1.00
DeepSeek71.166.36358.996.266.390.15046.293.580.696.235.987.566.32026llmOpen weights1.6T (49B active)1M301.16$0.44$0.87
DeepSeek65.36360.841.895.66389.444.938.695.632.1632026llmOpen weights284B (13B active)1M1090.76$0.10$0.20
#21OpenAI73.974.368.958.493.974.393.556.160.658.693.944.374.32026llmAPI only20251.1M670.97$5.00$30.00
Index 73.9 = (74.3 + 68.9 + 58.4 + 93.9 / 4) — equal-weighted mean of 4 components.
General25%
74.3
  • SimpleQA
  • AA-LCR74.3
  • LongBench-v2
  • IFBench
Reasoning25%
68.9
  • GPQA Diamond93.5
  • Humanity’s Last Exam44.3
  • FrontierMath
  • ARC-AGI-2
Coding25%
58.4
  • SWE-bench Verified
  • Terminal-Bench60.6
  • Aider Polyglot
  • SciCode56.1
Tool use & agents25%
93.9
  • TAU-bench Retail
  • τ²-bench93.9
  • BFCL
  • BrowseComp
InclusionAI50.134.741.734.189.834.775.23731.189.88.234.72026llmAPI only262K$0.08$0.63
Xiaomi68.673.360.246.794.273.386.650.243.294.233.873.32026llmOpen weights1M582.08$0.44$0.87
Xiaomi62.762.755.142.490.662.784.943.141.790.625.262.72026multimodalOpen weights1M922.67$0.14$0.28
Tencent60.354.756.137.792.754.786.741.234.192.725.554.72026llmOpen weights262K1002.53$0.06$0.21

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.