Linear regression is a supervised learning model that fits a straight line to data. It's likely the most widely used learning algorithm in the world today, and the concepts apply to many other ML models.
Using a dataset of house sizes and prices from Portland:
If a client's house is 1,250 sq ft, you can:
Data can be viewed as:
Each row in the table = one data point (cross) on the plot.
| Symbol | Meaning | Example |
|---|---|---|
| x | Input variable / feature | Size of house (2,104 sq ft) |
| y | Output variable / target | Price ($400,000) |
| m | Total number of training examples | 47 |
| (x, y) | Single training example | (2104, 400) |
| x⁽ⁱ⁾, y⁽ⁱ⁾ | The iᵗʰ training example | x⁽¹⁾ = 2104, y⁽¹⁾ = 400 |
Note: The superscript ⁽ⁱ⁾ is an index, not an exponent. x⁽²⁾ means "the second training example," not "x squared."
The dataset used to train the model is called the training set. The model learns patterns from this data, then makes predictions on new, unseen examples (like your client's house that hasn't sold yet).