Training

Scaling Laws for Neural Language Models

OpenAI·January 23, 2020

Jared Kaplan, Sam McCandlish, Tom Henighan

TL;DR

Shows that language-model loss falls as a smooth power law in model size, dataset size, and compute — making capability predictable from scale.

Why it matters

Gave labs the confidence to invest in ever-larger models by making the payoff of scale quantitative and predictable. Chinchilla later refined the optimal balance.

Related terms

Scaling Laws