What is Gradient Descent?
Gradient descent is an algorithm for minimizing any function, not just linear regression cost functions. It's one of the most important building blocks in machine learning, used to train everything from simple linear regression to advanced deep learning neural networks.
The Goal
Minimize the cost function $J(w, b)$ by finding the optimal values of parameters $w$ and $b$.
More generally, gradient descent can minimize functions with many parameters: $J(w_1, w_2, dots, w_n, b)$.
How Gradient Descent Works
Starting Point
- Initialize parameters (commonly $w = 0$ and $b = 0$).
- For linear regression, the starting values do not matter much.
The Process
- Look around — Evaluate which direction decreases $J$ the most.
- Take a small step — Move parameters slightly in that direction.
- Repeat — Keep adjusting until $J$ reaches a minimum.
The Hill Analogy
Imagine standing on a hilly landscape where:
- Your position = current values of $w$ and $b$
- Height = value of cost function $J$
- Goal = reach the lowest valley
At each step:
- Spin 360° and look around