AI Hub
All benchmarks
Coding

SWE-bench Pro

A harder, contamination-resistant evolution of SWE-bench on real GitHub issues.

4Models
58.6Top score
57.7Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.

Ranking

1Kimi K2.6
58.6
2GPT-5.5
58.6
3GPT-5.4
57.7
4Claude Haiku 4.5
39.5

Related Coding benchmarks