AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
Nanbeige20.6047.513.321.6084.926.6021.61002026llm$0.00$0.00
Trillion Labs37.814.733.110.193.314.760.117.82.393.36.114.72026llm$0.00$0.00
Alibaba59.96656.233.783.682.36686.143.124.253.582.383.626.282.4662026llmAPI only262K451.47$0.78$3.90
Anthropic72.270.765.660.492.177.370.791.368.851.948.580.89592.177.336.770.72026llmAPI only1M481.65$5.00$25.00
Alibaba46.64041.525.379.54073.732.318.279.59.3402026llmOpen weights262K921.14$0.11$0.80
StepFun55.64351.133.994.44383.140.427.394.419.1432026llmOpen weights262K1940.85$0.09$0.30
LongCat39.925.734.819.579.525.763.628.410.679.5625.72026llm1105.59$0.00$0.00
Moonshot AI65.565.358.741.995.965.387.94934.895.929.465.32026multimodalOpen weights1T (32B active)262K351.33$0.40$1.90
Upstage42.72741.316.286.32772.424.77.686.310.1272026llmAPI only128K$0.15$0.60
StepFun18.5039.618.216.106931.15.316.110.202026llm$0.00$0.00
Liquid AI10.40202.119.6033.94.2019.66.102026llm$0.00$0.00
Zhipu AI48.63532.627.998.83558.133.72298.87.1352026llmOpen weights203K1131.00$0.06$0.40
OpenAI68.975.761.745.992.175.789.954.637.192.133.575.72026multimodalAPI only400K1062.08$1.75$14.00
Allen Institute for AI14.8029.48.421.3053.916.7021.34.902026llm$0.00$0.00
Liquid AI7.9019.71.210.8032.62.3010.86.802026llm$0.00$0.00
Liquid AI6.80171.58.5028.9308.55.102026llm$0.00$0.00
TII UAE22.18.738.413.627.8808.766.124.92.372.48027.810.872.58.72026llm$0.00$0.00
LG AI Research51.255.745.729.274.390.355.778.335.622.776.890.374.313.183.855.72025llm$0.00$0.00
Naver38.211.733.520.387.45911.761.528.412.162.95987.45.578.511.72025llm$0.00$0.00
MiniMax585952.634.885.482.7598340.728.88182.785.422.287.5592025llmOpen weights205K921.14$0.29$0.95
#96Zhipu AI63.56455.538.595.9956485.945.131.889.49595.925.185.6642025llmOpen weights203K980.83$0.40$1.75
Index 63.5 = (64.0 + 55.5 + 38.5 + 95.9 / 4) — equal-weighted mean of 4 components.
General25%
64
  • SimpleQA
  • AA-LCR64
  • LongBench-v2
  • IFBench
Reasoning25%
55.5
  • GPQA Diamond85.9
  • Humanity’s Last Exam25.1
  • FrontierMath
  • ARC-AGI-2
Coding25%
38.5
  • SWE-bench Verified
  • Terminal-Bench31.8
  • Aider Polyglot
  • SciCode45.1
Tool use & agents25%
95.9
  • TAU-bench Retail
  • τ²-bench95.9
  • BFCL
  • BrowseComp
Google66.366.362.655.780.49766.390.450.638.690.8789780.434.78966.32025multimodalAPI only1M1911.05$0.50$3.00
#98Upstage34.13637.514.648.23665.726.92.348.29.2362025llm$0.00$0.00
Index 34.1 = (36.0 + 37.5 + 14.6 + 48.2 / 4) — equal-weighted mean of 4 components.
General25%
36
  • SimpleQA
  • AA-LCR36
  • LongBench-v2
  • IFBench
Reasoning25%
37.5
  • GPQA Diamond65.7
  • Humanity’s Last Exam9.2
  • FrontierMath
  • ARC-AGI-2
Coding25%
14.6
  • SWE-bench Verified
  • Terminal-Bench2.3
  • Aider Polyglot
  • SciCode26.9
Tool use & agents25%
48.2
  • TAU-bench Retail
  • τ²-bench48.2
  • BFCL
  • BrowseComp
NVIDIA34.833.74321.640.99133.775.729.613.674.19140.910.279.433.72025llm1480.30$0.10$0.20
MBZUAI Institute of Foundation Models34.652.740.419.925.452.771.3336.825.49.552.72025llm$0.00$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.