Table of contents:
For example, if the goal is to predict the price of a house given the size, and you are given a training set of existing house sizes and prices, the basic flow will look something like the image.
The input existing data used to determine the best parameters for the hypothesis.
Used to find a straight line through data.
$$ h_\theta(x) = \theta_0 + \theta_1x $$
Sometimes it will just be $h(x)$ for short.
$$ J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 $$
Finds the error between the hypothesis with a set of parameters and the actual results from the Training Set.
It is equal to $\frac{1}{2}\bar{x}$ where $\bar{x}$ is the mean of the squares of $h_\theta(x^{(i)}) - y^{(i)}$. This equals the difference between predicted and actual value. It is halved for convenience of computing Gradient Descent (derivative of the square will cancel out the half).