• Formalisation
    • Start with:

      a real-world question, e.g. ‘Do people who have higher education earn higher wages’

    • Method:

      Many possible ways to describe the relation, economist so happen to love to model relation in a linear model (basically, $y = a + bx$)

      If we are willing to model relationship as a linear model, we can write:

      $$ wage_i = \beta_0 + \beta_1 educ_i +u_i $$

      output-2.png

      Where

      • $\beta_0$ is the intercept
      • $\beta_1$ is strength of the relationship
      • $u_i$ is the everything else that cannot be explained by the linear relationship

      ⭐️ More generally

      $$ y_i = \beta_0 + \beta_1 x_i + u_i $$

    • Why linear?

      • Interpretability: Each β tells us the change in y for a one-unit change in x.
      • Approximation: Even if the true relationship isn’t perfectly linear, linear models often approximate it well locally.
      • Practicality: Linear models are mathematically tractable and can be estimated with data.*
    • What linear doesn’t mean:

      • It doesn’t mean the world is actually linear.

      • Also parameters can be NON-linear & we are modeling the relationship in a linear equation

        Example:

          $log(wage_i) = \\beta_0 + \\beta_1 educ_i  + \\beta_2 educ_i^2 + u_i$
        

        where $log(wage_i), educ_i^2$ are clearly non linear

    • Assumptions (keep in mind for now)

      • Independent and Identically Distributed (i.i.d.)
      • $u_i \sim N(0, \sigma^2 )$
        • For now we assume normal errors, but we can relax it later and assume $E[u_i] = 0$
      • $E[u_i|X_i] = 0$
      • No Perfect Collinearity
      • Homoskedasticity
      • Linearity in Parameters
    • Generalisation:

      • ‼️ OLS is ONE framework, not “simple” vs. “multiple.”

        with more variables just add more x

        • e.g. $wage_i = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + u_i$

        👉 Learn the simple regression logic first and then apply it to all OLS

        When you take more advanced econometrics, everything will be done in linear algebra and the model will become

        $$ y = \mathbf{X}\beta + \varepsilon $$

        → note that it looks almost identical to simple OLS