Table of Contents
  0:00 Recap 
  0:31 Plan 
  1:14 Optimization in deep learning 
  3:44 Gradient descent variants 
  7:58 Setting for the jupyter notebook 
  9:49 Vanilla gradient descent 
  12:14 Momentum 
  15:38 Nesterov accelerated gradient descent 
  18:00 Adagrad 
  20:06 RMSProp 
  22:11 Adam 
  24:39 AMSGrad 
  27:09 Pytorch optimizers 
An overview of gradient descent optimization algorithms by Sebastian Ruder
Gradient-based optimization A short introduction to optimization in Deep Learning, by Christian S. Perone