Linear regression is a supervised learning model that fits a straight line to data. It's likely the most widely used learning algorithm in the world today, and the concepts apply to many other ML models.

Example: Predicting House Prices

Using a dataset of house sizes and prices from Portland:

If a client's house is 1,250 sq ft, you can:

  1. Fit a straight line to the training data
  2. Find where 1,250 intersects the line
  3. Read off the predicted price (≈ $220{,}000)

Data Representation

Data can be viewed as:

Each row in the table = one data point (cross) on the plot.

Standard ML Notation

Symbol Meaning Example
x Input variable / feature Size of house (2,104 sq ft)
y Output variable / target Price ($400,000)
m Total number of training examples 47
(x, y) Single training example (2104, 400)
x⁽ⁱ⁾, y⁽ⁱ⁾ The iᵗʰ training example x⁽¹⁾ = 2104, y⁽¹⁾ = 400

Note: The superscript ⁽ⁱ⁾ is an index, not an exponent. x⁽²⁾ means "the second training example," not "x squared."

Training Set

The dataset used to train the model is called the training set. The model learns patterns from this data, then makes predictions on new, unseen examples (like your client's house that hasn't sold yet).