AI War Tracker
298 models in catalog

AI models

Every model we track — frontier flagships, open-weights specialists, narrow benchmarks-only releases. Filter by lab, country, access, modality, or release window. Sorted by newest by default; head to the leaderboard for the ranked view.

Leaderboard →Labs →Benchmarks →

Updated May 30, 2026 · Benchmarks via Artificial Analysis, specs & pricing via OpenRouter · Methodology · Spotted an error?

RankModelIndexGeneralReasonCodingAgentsMathMultiLong ctxGPQA DiamondDROPARC-AGI-2BIG-Bench HardSciCodeTerminal-BenchLiveCodeBenchSWE-bench VerifiedAider PolyglotHumanEvalAider Polyglot EditMBPPMultiPL-ESWE-bench ProAIME 2025MATH-500AIME 2024MATHGSM8KMGSMHMMT 2025FrontierMathτ²-benchTAU-bench RetailTAU-bench AirlineBFCLBrowseCompτ²-bench Airlineτ²-bench RetailMMMUMathVistaChartQADocVQAMMMU-ProAI2DHumanity’s Last ExamMMLU-ProMMLUIFEvalSimpleQAMulti-IFLiveBenchArena HardAA-LCRLongBench-v2ReleasedCountryTypeAccessParamsCutoffContextSpeedLatencyIn $/MOut $/M
Xiaomi61.964.352.935.39596.364.384.639.431.186.896.39521.184.364.32025llmOpen weights262K1451.34$0.10$0.30
#102Allen Institute for AI11.8032.614.7077.3059.129.3069.577.30676.302025llm$0.00$0.00
Index 11.8 = (0.0 + 32.6 + 14.7 + 0.0 / 4) — equal-weighted mean of 4 components.
General25%
0
  • SimpleQA
  • AA-LCR0
  • LongBench-v2
  • IFBench
Reasoning25%
32.6
  • GPQA Diamond59.1
  • Humanity’s Last Exam6
  • FrontierMath
  • ARC-AGI-2
Coding25%
14.7
  • SWE-bench Verified
  • Terminal-Bench0
  • Aider Polyglot
  • SciCode29.3
Tool use & agents25%
0
  • TAU-bench Retail
  • τ²-bench0
  • BFCL
  • BrowseComp
OpenAI69.472.760.259.784.810072.792.452.952.14789.48010084.835.487.472.72025llmAPI only400K730.69$1.75$14.00
Korea Telecom391140.518.186.578.71172.233.2365.678.786.58.881.3112025llm$0.00$0.00
Allen Institute for AI7.6023.56.70042.513.3004.402025llm$0.00$0.00
Mistral AI28.13031.52624.936.73059.433.118.944.836.724.93.676.2302025llmOpen weights262K510.64$0.40$2.00
Mistral AI24.62428.322.823.434.32453.228.816.734.834.323.43.467.8242025llm620.75$0.00$0.00
Zhipu AI33.740.340.422.431.685.340.371.930.414.441.185.331.68.979.940.32025multimodalOpen weights131K441.31$0.30$0.90
MBZUAI Institute of Foundation Models29.833.338.919.227.878.333.368.128.69.869.478.327.89.878.633.32025llm$0.00$0.00
Motif Technologies28.61338.91646.580.31369.528.23.865.180.346.58.279.6132025llm$0.00$0.00
Amazon51.858.34627.275.794.358.381.136.917.471.194.375.710.981.858.32025multimodalAPI only1M2290.89$0.30$2.50
Mistral AI30.434.736.126.124.63834.76836.215.946.53824.64.180.734.72025llmOpen weights675B (41B active)262K540.64$0.50$1.50
Mistral AI23.62230.914.127.2302257.223.64.535.13027.24.669.3222025multimodalOpen weights262K670.41$0.20$0.20
Mistral AI22.32425.712.726.631.72447.120.84.530.331.726.64.364.2242025multimodalOpen weights262K860.38$0.15$0.15
Mistral AI16.111.720.57.224.92211.735.814.4024.72224.95.352.411.72025multimodalOpen weights131K1540.34$0.10$0.10
DeepSeek64.26553.148.290.692658438.935.686.270.29290.622.286.2652025llmOpen weights671B (37B active)131K$0.25$0.38
DeepSeek38.859.356.639.4096.759.387.14434.889.696.7026.186.359.32025llmOpen weights164K$0.29$0.43
Amazon57.961.743.733.592.78961.778.542.724.2738992.78.98361.72025llm1490.81$1.30$10.00
Prime Intellect31.832.344.124.126.68832.376.139.19.177.78826.612.182.232.32025llmOpen weights131K$0.20$1.10
Amazon49.353.741.421.580.489.753.77636.26.86689.780.46.880.953.72025llm$0.30$2.50
ServiceNow46.850.341.625.969.38850.373.337.314.480.78869.39.87950.32025llm$0.00$0.00
Anthropic70.17457.759.189.591.3748749.54787.180.991.389.528.489.5742025llmAPI only200K581.50$5.00$25.00
Allen Institute for AI12.2033.515.1073.706128.61.567.273.705.975.902025llmOpen weights66K$0.15$0.50
Allen Institute for AI10.2022.95.212.641.304010.3026.641.312.65.852.202025llm$0.10$0.20
Allen Institute for AI9.9028.711070.7051.621.20.861.770.705.765.502025llm$0.00$0.00

Score columns under Index are the v1.2 weighted components (25% each) that feed it. Reference per-category averages (not in the index) follow. Every individual benchmark in our catalog is also shown — grouped by category, ordered by coverage. Hover any header for details — click to sort. Cell colors show relative standing within each column (red → yellow → green). Scores are curated approximations — see each model for sources.