Reinforcement LearningEvaluation

Olympiad-level Formal Mathematical Reasoning with Reinforcement Learning (AlphaProof)

Google DeepMind·November 12, 2025

AlphaProof Team, Google DeepMind

TL;DR

Describes AlphaProof, an AlphaZero-style RL agent that proves competition math in the Lean theorem prover, trained on millions of auto-formalized problems with test-time RL — part of DeepMind’s silver-medal IMO result.

Why it matters

The first peer-reviewed (Nature) account of an RL system reaching near-medalist formal-math performance; a landmark for verifiable, neuro-symbolic reasoning.

Related terms

Test-Time Compute Scaling