The learning rate α has a huge impact on gradient descent performance. A poor choice can make gradient descent extremely slow or fail entirely.
What happens:
Visualization:
Start → tiny step → tiny step → tiny step → ... → (eventually) minimum
Result: Gradient descent works, but is painfully slow.
What happens:
Visualization:
Start (near minimum) → overshoot right → overshoot left → overshoot further right → ...
Result: Gradient descent fails to converge and may diverge (get worse and worse).