Coding

SWE-bench Pro

A harder, contamination-resistant evolution of SWE-bench on real GitHub issues.

4Models

58.6Top score

57.7Median

State of the art over time

Each point is a model at its release date; the line traces the best score to date.