Perplexity

A measure of how well a language model predicts a sample of text; lower is better.

Perplexity is the exponentiated average negative log-likelihood the model assigns to held-out text — intuitively, how "surprised" it is by the next token. It was the dominant LM metric before task benchmarks, and is still useful for tracking pretraining progress.