Gradient Descent

Iteratively nudging parameters downhill to minimize a loss.

Gradient descent improves a model by repeatedly taking small steps in the direction that most reduces its error. Stochastic gradient descent (SGD) does this on small random batches of data, making it scalable to huge datasets. It is the basic optimization loop behind nearly all deep learning.

Related papers