We have decided on a model Formulation I, *and* estimated a value for our parameters $\beta$ Estimation I
The next question we want to ask is:
Essentially, inference is the process of evaluating and drawing conclusions from our statistical model
In this Inference I, I will focus on $R^2$.
Afterwards, I will explain the FWL Theorem in Estimation II which will give us foundations for evaluating $\hat \beta_1$ in Inference II
The first evaluation/ inference we conduct is the “Goodness-of-fit” or $R^2$. The idea is to compare the performance of our model with that of the naive model, which is simply taking the mean.
Recall from Estimation I that economists evaluate the fit of a model using errors (i.e. the difference between the real value $y$ and the predicted value. $\hat y_{ols} = \hat \beta_0 + \hat \beta_1 x$)
For use to evaluate the performance, we also need a benchmark model. For the simpliest model, let’s use mean as our prediction: $\hat y = \bar y$.
Doing the same procedure of adding up the squared errors (because errors come in both positive and negative), we can construct three new statistics.
$$ \begin{aligned} \text{Total Sum of Squared Residuals} &= \text{SST} &= \sum_{i=1}^N (y_i - \bar y)^2 \\ \text{Explained Sum of Squared Residuals} &= \text{SSE} &= \sum_{i=1}^N (\hat y_i - \bar y)^2\\ \text{Sum of Squared Residuals} &= \text{SSR} &= \sum_{i=1}^N (y_i - \hat y_i = \hat u_i)^2
\end{aligned} $$
See below for an illustration of the residuals

I have seen online resources that claims $\text{SSE} = \sum_{i=1}^N (y_i - \hat y_i)^2$. In my humble opinion, that is wrong. $\sum_{i=1}^N (y_i - \hat y_i)^2$ is the residual, and the difference between the predicted value and the naive mean, i.e. improvement to the naive model $\hat y_i - \bar y$ is residuals “explained” by the model.
Next, let’s put the sum of residuals into fractions, and we can construct $R^2$ , which measures the proportion of total residuals that are explained by the model, aka goodness of fit
$$ \begin{aligned} R^2 &= \text{Proportion of Residual Explained} &= \frac{SSE}{SST} &\\ R^2 &= 1 - \text{Proportion of Residual Remain Unexplained} &= 1 - \frac{SSR}{SST} & \end{aligned} $$
A high $R^2$ indicates that the model has stronger predictive power, and a low $R^2$ the contrary.