Reasoning Models

Models that spend extra compute "thinking" before they answer.

Reasoning models generate a long internal chain of thought at inference time, exploring and checking steps before producing a final answer. Trained heavily with reinforcement learning, they trade latency and cost for large gains on math, science, and code.

OpenAI’s o-series, DeepSeek-R1, Claude’s extended thinking, and Gemini’s thinking modes are all examples. This is the "test-time compute" axis of scaling.

Related papers