TrainingEvaluation

Language Models are Unsupervised Multitask Learners

OpenAI·February 14, 2019

Alec Radford, Jeffrey Wu, Rewon Child

TL;DR

GPT-2 shows that a large language model trained only to predict the next word can perform many tasks zero-shot, without task-specific training.

Introduced the scaling-and-prompting paradigm that GPT-3 would later cement, and was famous for OpenAI’s staged release over misuse concerns.