TrainingReinforcement LearningSafety
Training language models to follow instructions with human feedback
OpenAI·March 4, 2022
Long Ouyang, Jeff Wu, Xu Jiang
View on arXivTL;DR
InstructGPT fine-tunes GPT-3 with reinforcement learning from human feedback (RLHF), making a much smaller model more helpful and aligned than the raw 175B base.
Why it matters
The blueprint for ChatGPT. RLHF turned powerful-but-unruly base models into helpful assistants, and became the standard alignment recipe across the industry.