Reinforcement Learning

Learning by trial and error to maximize reward.

In reinforcement learning, an agent takes actions in an environment and learns a policy that maximizes cumulative reward through feedback. It underpins game-playing systems like AlphaGo and, via RLHF, the alignment of modern chat models.

Related papers