AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
xAI61.86851.534.293.389.36885.344.224.282.289.393.317.685.4682025llm$0.00$0.00
Google67.370.753.55887.195.770.791.931.156.141.791.776.295.787.137.589.870.72025multimodalAPI only1M14127.49$2.00$12.00
OpenAI60.667.354.737.58395.767.38640.234.884.995.78323.48667.32025multimodalAPI only400K1884.16$1.25$10.00
OpenAI53.262.749.13862.991.762.781.342.633.383.691.762.916.98262.72025multimodalAPI only400K1759.50$0.25$2.00
Baidu41.86.745.231.383.9856.777.737.52581.28583.912.7836.72025llm$0.00$0.00
OpenAI64.77557.344.481.9947588.143.345.586.89481.926.587752025llmAPI only400K1150.77$1.25$10.00
Kuaishou60.17454.922.988.694.77476.436.69.174.794.788.633.481.3742025llm1082.19$0.30$1.20
ByteDance50.565.344.933.658.279.365.376.440.726.576.679.358.213.385.465.32025llm$0.00$0.00
#134Moonshot AI65.366.353.448.39394.766.384.542.431.185.371.394.79322.384.866.32025llmOpen weights1T (32B active)262K1001.00$0.60$2.50
Index 65.3 = (66.3 + 53.4 + 48.3 + 93.0 / 4) — equal-weighted mean of 4 components.
General25%
66.3
  • SimpleQA
  • AA-LCR66.3
  • LongBench-v2
  • IFBench
Reasoning25%
53.4
  • GPQA Diamond84.5
  • Humanity’s Last Exam22.3
  • FrontierMath
  • ARC-AGI-2
Coding25%
48.3
  • SWE-bench Verified71.3
  • Terminal-Bench31.1
  • Aider Polyglot
  • SciCode42.4
Tool use & agents25%
93
  • TAU-bench Retail
  • τ²-bench93
  • BFCL
  • BrowseComp
Moonshot AI15.925.72215.7036.325.741.219.911.437.836.302.758.525.72025llm$0.00$0.00
NVIDIA274031.315.421.3754057.226.24.569.47521.35.375.9402025llm2440.74$0.20$0.60
IBM12416.64.422.86.3428.18.704.76.322.85.132.542025llm$0.00$0.00
IBM11.46.315.74.119.66.36.326.38.2011.56.319.6527.76.32025llm$0.00$0.00
IBM7.9016.10.914.61.3025.71.701.91.314.66.412.702025llm$0.00$0.00
IBM7.4015.90.513.20026.10.902.4013.25.712.402025llm$0.00$0.00
MiniMax60.96145.150.686.878.36177.736.146.382.669.478.386.812.582612025llmOpen weights230B (10B active)205K911.19$0.26$1.00
Alibaba29.131.336.719.229.268.331.367.130.18.351.468.329.26.379.131.32025multimodalOpen weights262K761.16$0.10$0.42
Alibaba40.155.341.418.145.684.755.373.328.57.673.884.745.69.681.855.32025llm931.26$0.70$8.40
IBM10.7419.46.712.66433.611.91.518612.65.144.742025llmOpen weights131K$0.02$0.11
Microsoft11.513.718.75.48.238.213.733.110.8012.66.769.68.24.246.513.72025llmOpen weights131K$0.08$0.35
Anthropic54.770.341.452.554.796.370.37343.34161.573.339.596.354.763.683.29.78070.32025llmAPI only2025200K1000.30$1.00$5.00
Alibaba24.33130.612.922.530.73157.921.93.835.330.722.53.374.9312025llm1201.15$0.20$2.10
Alibaba19.315.322.89.929.227.315.342.717.42.333.227.329.22.968.615.32025multimodalOpen weights256K1451.05$0.08$0.50
Alibaba18.321.326.99.315.525.721.349.417.11.53225.715.54.47021.32025llm$0.00$0.00
Alibaba15.91320.46.923.4371337.113.70293723.43.763.4132025llm$0.00$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.