Training
Scaling Laws for Neural Language Models
OpenAI·January 23, 2020
Jared Kaplan, Sam McCandlish, Tom Henighan
View on arXivTL;DR
Shows that language-model loss falls as a smooth power law in model size, dataset size, and compute — making capability predictable from scale.
Why it matters
Gave labs the confidence to invest in ever-larger models by making the payoff of scale quantitative and predictable. Chinchilla later refined the optimal balance.