TrainingReinforcement LearningSafety

Training language models to follow instructions with human feedback

OpenAI·March 4, 2022

Long Ouyang, Jeff Wu, Xu Jiang

TL;DR

InstructGPT fine-tunes GPT-3 with reinforcement learning from human feedback (RLHF), making a much smaller model more helpful and aligned than the raw 175B base.

Why it matters

The blueprint for ChatGPT. RLHF turned powerful-but-unruly base models into helpful assistants, and became the standard alignment recipe across the industry.

Related terms

RLHF