SWE-bench Pro
A harder, contamination-resistant evolution of SWE-bench on real GitHub issues.
4Models
58.6Top score
57.7Median
State of the art over time
Each point is a model at its release date; the line traces the best score to date.
Ranking
| 1 | Kimi K2.6 | 58.6 |
| 2 | GPT-5.5 | 58.6 |
| 3 | GPT-5.4 | 57.7 |
| 4 | Claude Haiku 4.5 | 39.5 |