4 OLS - Inference I: $R^2$

We have decided on a model Formulation I, *and* estimated a value for our parameters $\beta$ Estimation I

The next question we want to ask is:

How well is our model doing - $R^2$
How informative is our estimated parameter $\hat \beta_1$

Essentially, inference is the process of evaluating and drawing conclusions from our statistical model

In this Inference I, I will focus on $R^2$.

Afterwards, I will explain the FWL Theorem in Estimation II which will give us foundations for evaluating $\hat \beta_1$ in Inference II

The first evaluation/ inference we conduct is the “Goodness-of-fit” or $R^2$. The idea is to compare the performance of our model with that of the naive model, which is simply taking the mean.

Recall from Estimation I that economists evaluate the fit of a model using errors (i.e. the difference between the real value $y$ and the predicted value. $\hat y_{ols} = \hat \beta_0 + \hat \beta_1 x$)

For use to evaluate the performance, we also need a benchmark model. For the simpliest model, let’s use mean as our prediction: $\hat y = \bar y$.

Doing the same procedure of adding up the squared errors (because errors come in both positive and negative), we can construct three new statistics.

$$ \begin{aligned} \text{Total Sum of Squared Residuals} &= \text{SST} &= \sum_{i=1}^N (y_i - \bar y)^2 \\ \text{Explained Sum of Squared Residuals} &= \text{SSE} &= \sum_{i=1}^N (\hat y_i - \bar y)^2\\ \text{Sum of Squared Residuals} &= \text{SSR} &= \sum_{i=1}^N (y_i - \hat y_i = \hat u_i)^2

\end{aligned} $$

See below for an illustration of the residuals

Total Residual: The difference between the the observation $y_i$ and the baseline prediction $\bar y$. You can think of this as the worst any informative model can perform; if any model performs worse than the mean prediction, we can fall back to the mean model. To put it in other words If we think of residuals as errors, the $\hat y = \bar y$ baseline model is the Maximum/ Total error you can make
Explained Residual: The blue regression line is much closer to the observation. We call the additional predictive power the explained residual
Residual: Unless the model is 100% accurate*, there will still be differences between the observed outcome $y_i$ and the predicted outcome $\hat y$. Following our naming convention, we simply call it residual

A 100% model is not necessary preferred because of overfitting which means that the model fits the data “too-well” but model would usually fail to generalise when it is applied to new data. ← Important concept but not covered in 2nd year econometrics, I believe

I have seen online resources that claims $\text{SSE} = \sum_{i=1}^N (y_i - \hat y_i)^2$. In my humble opinion, that is wrong. $\sum_{i=1}^N (y_i - \hat y_i)^2$ is the residual, and the difference between the predicted value and the naive mean, i.e. improvement to the naive model $\hat y_i - \bar y$ is residuals “explained” by the model.

Next, let’s put the sum of residuals into fractions, and we can construct $R^2$ , which measures the proportion of total residuals that are explained by the model, aka goodness of fit

$$ \begin{aligned} R^2 &= \text{Proportion of Residual Explained} &= \frac{SSE}{SST} &\\ R^2 &= 1 - \text{Proportion of Residual Remain Unexplained} &= 1 - \frac{SSR}{SST} & \end{aligned} $$

A high $R^2$ indicates that the model has stronger predictive power, and a low $R^2$ the contrary.