Perplexity
A measure of how well a language model predicts a sample of text; lower is better.
Perplexity is the exponentiated average negative log-likelihood the model assigns to held-out text — intuitively, how "surprised" it is by the next token. It was the dominant LM metric before task benchmarks, and is still useful for tracking pretraining progress.