Jupyter Notebook

Adaptive Gradient Descent (AdaGrad)

Equation:

$\mathbf{w}^{(t+1)}=\mathbf{w}^{(t)}-\frac{\eta}{\sqrt{\sum\limits_{i=1}^t\mathbf{g}^{(t)T}\mathbf{g}^{(t)}+\varepsilon}}\mathbf{g}^{(t)}$

$\mathbf{g}^{(t)}=∇_\mathbf{w}L(\mathbf{w}^{(t)})\\$

Parameters:

Properties:

API:

opt = tf.keras.optimizers.Adagrad(learning_rate)
opt.minimize(loss, var_list=[w])

RMSprop

Equation:

$\mathbf{w}^{(t+1)}=\mathbf{w}^{(t)}-\frac{\eta}{\sqrt{G^{(t)}} + \varepsilon}\mathbf{g}^{(t)}$

$\mathbf{g}^{(t)}=\nabla_\mathbf{w}L(\mathbf{w}^{(t)})$