The Impact of Learning Rate Choice

The learning rate α has a huge impact on gradient descent performance. A poor choice can make gradient descent extremely slow or fail entirely.

Case 1: Learning Rate Too Small

What happens:

Visualization:

Start → tiny step → tiny step → tiny step → ... → (eventually) minimum

Result: Gradient descent works, but is painfully slow.

Case 2: Learning Rate Too Large

What happens:

Visualization:

Start (near minimum) → overshoot right → overshoot left → overshoot further right → ...

Result: Gradient descent fails to converge and may diverge (get worse and worse).

Summary: Learning Rate Effects