Training

Distilling the Knowledge in a Neural Network

Google·March 9, 2015

Geoffrey Hinton, Oriol Vinyals, Jeff Dean

TL;DR

Introduces knowledge distillation: training a small student model to match a large teacher’s soft predictions, retaining much of its accuracy.

A foundational technique now used to ship fast, cheap versions of large models across the industry.