AI Hub
All terms

Knowledge Distillation

Training a small "student" model to imitate a larger "teacher".

Distillation transfers capability from a large, expensive model into a smaller, cheaper one by training the student to match the teacher’s outputs (or internal signals) rather than just hard labels. The student keeps much of the quality at a fraction of the cost.

It underpins many of the fast, small models shipped alongside frontier flagships.

Related papers