TrainingEvaluation
Language Models are Unsupervised Multitask Learners
OpenAI·February 14, 2019
Alec Radford, Jeffrey Wu, Rewon Child
TL;DR
GPT-2 shows that a large language model trained only to predict the next word can perform many tasks zero-shot, without task-specific training.
Why it matters
Introduced the scaling-and-prompting paradigm that GPT-3 would later cement, and was famous for OpenAI’s staged release over misuse concerns.