Reinforcement Learning
Learning by trial and error to maximize reward.
In reinforcement learning, an agent takes actions in an environment and learns a policy that maximizes cumulative reward through feedback. It underpins game-playing systems like AlphaGo and, via RLHF, the alignment of modern chat models.