Supervised Fine-Tuning (SFT)

Training a pretrained model on labeled input→output examples to teach a specific behavior, format, or task.

SFT is typically the first post-training step after pretraining: the base model is shown high-quality demonstrations (e.g. instruction → ideal response) and learns to imitate them. It is what turns a raw next-token predictor into a helpful instruction-following assistant.

SFT is often followed by preference-based methods like RLHF or DPO that further align the model with human preferences.