Model Selection

common procedures employed to achieve this

  1. Shrinkage: reducing the values of all the coefficients corresponding to all the p-predictors towards zero, has been shown to improve the variance
  2. Subset selection: selecting a subset of the p predictors which are thought to be the best predictors
  3. Dimensionality reduction: projecting the p-dimensional data on some M dimensions using linear transformations, reduces the complexity and captures the predictors with most variance

Choosing the Optimal Model

Common metrics used to estimate the test error

  1. (training set) Mean squared error = Residual square sum / n
  2. $C_p = \frac{1}{n}(RSS+2d\hat\sigma^2)$ : lower is better
  3. $\text{adjusted }R^2 = 1- \frac{\frac{RSS}{(n-d-1)}}{\frac {TSS}{(n-1)}}$

these discussed methods, assume the absence of test-data, but in modern workflows we prefer using validation and cross-validation regimes where test data is readily available, hence other than some niche applications, these metrics (other than adjusted R2 scores) are seldom used.

Screenshot 2025-07-31 at 3.28.00 PM.png

Ridge and Lasso in LR - Regularisation