Last time, we successfully derived the OLS formula for estimating $\beta$. Next, we are going to
So far, I have avoided getting into too many details about the assumptions. They are very important, but it would not make sense to introduce them before you understand why you need those assumptions. This time, I will go through one of the key assumptions in OLS $\mathbb E [u|x] = 0$.
Remember from our formulation, we mention that $\beta$ is the slope of the best fitting line. Starting again with high-school math, the equation for slope is
$$ \beta = \frac{y_2 - y_1}{x_2 - x_1} $$
If you are not aware, division can be interpreted as a re-scaling operation. In a sense, we are rescaling the numerator to 1 unit of denominator.
For example you get $30 off every $600 dollar of purchase really means that, on average, you get $0.05 off for every dollar spent.
Next let’s rewrite the equation a bit
$$ \begin{aligned} \beta & = \frac{y_2 - y_1}{x_2 - x_1} \\ & = \frac{\Delta y}{\Delta x} \end{aligned} $$
‼️Meaning that $\beta$ captures the change in $y$ for a 1 unit of change in $x$
Now let’s consider the two points $(x_1, y_1) (x_2, y_2)$ getting closer and closer, the slope becomes the instantaneous rate of change. This is precisely what the derivative does:
$$ \frac{dy}{dx} = \lim_{\Delta \to 0} \frac{\Delta y}{\Delta x} $$
If we go back to our OLS equation, we get:
$$ \begin{aligned} y &= \beta_0 + \beta_1 x +\varepsilon\\ \frac{d y}{d x} &= \beta_0\frac{d}{dx} + \beta_1 \frac{dx}{dx} + \frac{d \varepsilon}{dx} \end{aligned} $$
Let’s look at each component