Maximum Likelihood Estimation

$$ \hat{\theta} \in \{arg\ max\ \ell(\theta;x)\}, $$

Batch(all) Gradient Descent

w := w - lr * dL/dw

Stochastic Gradient Descent

Learning rate decay

Batch Normalization