AI Hub
All papers
Training

Distilling the Knowledge in a Neural Network

Google·March 9, 2015

Geoffrey Hinton, Oriol Vinyals, Jeff Dean

View on arXiv

TL;DR

Introduces knowledge distillation: training a small student model to match a large teacher’s soft predictions, retaining much of its accuracy.

Why it matters

A foundational technique now used to ship fast, cheap versions of large models across the industry.

Related terms