Training
Distilling the Knowledge in a Neural Network
Google·March 9, 2015
Geoffrey Hinton, Oriol Vinyals, Jeff Dean
View on arXivTL;DR
Introduces knowledge distillation: training a small student model to match a large teacher’s soft predictions, retaining much of its accuracy.
Why it matters
A foundational technique now used to ship fast, cheap versions of large models across the industry.