Reinforcement LearningEvaluation
Olympiad-level Formal Mathematical Reasoning with Reinforcement Learning (AlphaProof)
Google DeepMind·November 12, 2025
AlphaProof Team, Google DeepMind
View on arXivTL;DR
Describes AlphaProof, an AlphaZero-style RL agent that proves competition math in the Lean theorem prover, trained on millions of auto-formalized problems with test-time RL — part of DeepMind’s silver-medal IMO result.
Why it matters
The first peer-reviewed (Nature) account of an RL system reaching near-medalist formal-math performance; a landmark for verifiable, neuro-symbolic reasoning.