AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
Alibaba36.75940.319.328.176.95970.733.35.370.756.397.628.19.880.5592025llm1511.18$0.30$1.90
Alibaba21.922.736.418.310.281.922.765.930.46.151.566.397.510.26.877.722.72025llm1221.25$0.20$0.40
Zhipu AI47.648.346.845.549.787.648.379.134.837.572.964.273.798.2914379.760.426.414.484.648.32025llmOpen weights355B (32B active)2024131K850.70$0.60$2.20
Alibaba48.867472853.294.7677942.413.678.89198.453.21584.3672025llm591.21$0.40$2.20
Zhipu AI43.643.742.839.448.689.443.77530.63070.757.680.798.189.446.577.960.821.310.681.443.72025llmOpen weights2024131K631.68$0.13$0.85
NVIDIA30.73440.82028.187.53474.834.85.373.776.798.328.16.881.4342025llm510.29$0.10$0.40
Alibaba39.142.844.136.233.384.231.277.53615.252.457.387.970.39833.34471.310.68388.754.377.531.22025llmOpen weights235000000000131K631.18$0.15$0.80
Alibaba36.642.333.127.443.666.842.361.835.918.958.539.394.243.64.478.842.32025llm691.68$0.30$1.80
Google26.33134.820.51973.472.951.364.619.34.533.731.626.749.896.91972.95.175.910.751.32025multimodalAPI only20251M60.44$0.10$0.40
LG AI Research23.21442.219.117.388.91473.934.43.874.78097.717.310.581.8142025llm$0.00$0.00
LG AI Research13.5028.74.720.550.3051.59.3051.650.320.55.858.802025llm$0.00$0.00
Moonshot AI49.45141.843.861.174.65176.634.515.955.665.859.15797.169.661.1782.489.5512025llmOpen weights1T (32B active)2024131K261.51$0.57$2.30
#213Mistral AI23.628.726.519.319.937.728.749.229.49.133.74.770.719.93.870.828.72025llmAPI only2025131K720.49$0.40$2.00
Index 23.6 = (28.7 + 26.5 + 19.3 + 19.9 / 4) — equal-weighted mean of 4 components.
General25%
28.7
  • SimpleQA
  • AA-LCR28.7
  • LongBench-v2
  • IFBench
Reasoning25%
26.5
  • GPQA Diamond49.2
  • Humanity’s Last Exam3.8
  • FrontierMath
  • ARC-AGI-2
Coding25%
19.3
  • SWE-bench Verified
  • Terminal-Bench9.1
  • Aider Polyglot
  • SciCode29.4
Tool use & agents25%
19.9
  • TAU-bench Retail
  • τ²-bench19.9
  • BFCL
  • BrowseComp
Liquid AI7.1014.31.312.63.3022.82.5023.312.65.725.702025llm$0.00$0.00
xAI61.36847.854.474.995.46887.515.945.737.97979.691.79974.94086.6682025llmAPI only2024256K1000.70$3.00$15.00
AI21 Labs12.112.718.44.712.613.112.732.29.306.10.325.812.64.538.812.72025llm$0.00$0.00
Baidu15.92.342.318.8067.22.381.131.56.146.741.393.103.577.62.32025llmOpen weights2025131K241.53$0.28$1.10
Google4.1013.53039.7022.95.20.89.510.369.10437.802025llm$0.00$0.00
Mistral AI22.717.327.416.629.557.717.350.526.46.827.52788.329.54.368.117.32025llm1000.40$0.10$0.30
MiniMax36.954.33920.234.279.554.369.737.4371.1619834.28.281.654.32025llm$0.60$2.20
MiniMax35.351.737.92031.655.551.768.237.82.365.713.797.231.67.580.851.72025llm$0.00$0.00
Mistral AI20.3038.719.423.166067.929.79.152.740.391.723.19.575.302025llm$0.00$0.00
Mistral AI19.2035.714.326.668.8064.124.14.551.441.396.326.67.274.602025llm$0.00$0.00
DeepSeek14.41333.411078.51361.220.41.551.363.793.205.673.9132025llm$0.00$0.00
DeepSeek46.673.549.440.622.789.254.78140.35.773.344.671.687.598.391.479.436.58.917.78592.354.72025llmOpen weights671000000000131K450.30$0.55$2.19

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.