Linear Regression

Linear regression is a supervised learning model that fits a straight line to data. It's likely the most widely used learning algorithm in the world today, and the concepts apply to many other ML models.

Example: Predicting House Prices

Using a dataset of house sizes and prices from Portland:

Input (x): Size of house in square feet
Output (y): Price in thousands of dollars

If a client's house is 1,250 sq ft, you can:

Fit a straight line to the training data
Find where 1,250 intersects the line
Read off the predicted price (≈ $220{,}000)

Linear regression is one type of regression model

Data Representation

Data can be viewed as:

A scatter plot (visual)
A data table (rows and columns)

Each row in the table = one data point (cross) on the plot.

Standard ML Notation

Symbol	Meaning	Example
x	Input variable / feature	Size of house (2,104 sq ft)
y	Output variable / target	Price ($400,000)
m	Total number of training examples	47
(x, y)	Single training example	(2104, 400)
x⁽ⁱ⁾, y⁽ⁱ⁾	The iᵗʰ training example	x⁽¹⁾ = 2104, y⁽¹⁾ = 400

Note: The superscript ⁽ⁱ⁾ is an index, not an exponent. x⁽²⁾ means "the second training example," not "x squared."

Training Set

The dataset used to train the model is called the training set. The model learns patterns from this data, then makes predictions on new, unseen examples (like your client's house that hasn't sold yet).