Pretraining

The large-scale self-supervised stage that builds a model’s base knowledge.

Pretraining trains a model on vast amounts of unlabeled data with a self-supervised objective — typically next-token prediction. This is where a model acquires its broad knowledge and language ability, before any task-specific fine-tuning or alignment.

It is by far the most compute-intensive phase of building a foundation model.

Related papers