Training

Adam: A Method for Stochastic Optimization

University of Toronto·December 22, 2014

Diederik P. Kingma, Jimmy Ba

TL;DR

Introduces Adam, an adaptive optimizer that became the default for training deep networks.

One of the most-cited papers in all of science — the optimizer almost everything is trained with.