Knowledge Distillation
Training a small "student" model to imitate a larger "teacher".
Distillation transfers capability from a large, expensive model into a smaller, cheaper one by training the student to match the teacher’s outputs (or internal signals) rather than just hard labels. The student keeps much of the quality at a fraction of the cost.
It underpins many of the fast, small models shipped alongside frontier flagships.