Table of Contents
0:00 Recap
0:31 Plan
1:14 Optimization in deep learning
3:44 Gradient descent variants
7:58 Setting for the jupyter notebook
9:49 Vanilla gradient descent
12:14 Momentum
15:38 Nesterov accelerated gradient descent
18:00 Adagrad
20:06 RMSProp
22:11 Adam
24:39 AMSGrad
27:09 Pytorch optimizers
An overview of gradient descent optimization algorithms by Sebastian Ruder
Gradient-based optimization A short introduction to optimization in Deep Learning, by Christian S. Perone