Training
Adam: A Method for Stochastic Optimization
University of Toronto·December 22, 2014
Diederik P. Kingma, Jimmy Ba
View on arXivTL;DR
Introduces Adam, an adaptive optimizer that became the default for training deep networks.
Why it matters
One of the most-cited papers in all of science — the optimizer almost everything is trained with.