what is Resampling ?

repeatedly drawing samples from the training data, and training the model on it for testing additional information about the fitted model
- we maybe interested in the variability of a fitted linear regression model
- since this method can quickly evolve to consume humongous amounts of data, we use special statistical techniques to estimate the test error rate

Validation set approach

simple train-test split,randomly divide the available data into two sets
- what about data imbalances?
- distribution of classes in the splits ?

highly variable test scores, based on the observations included in the respective sets
data loss, only a subset of the total data corpus are used to train when we are aware that statistical methods perform better with more data

similar to Validation set approach, but here we select a single data point every point for validation and repeat this several times on the entire dataset
test error is averaged over all the iterations

instead of using n-divisions for n data points dataset, perform k (<n) folds in the dataset and iterate k times on the data set, using the k points for validation and n-k for training
- computationally inexpensive
- gives more statistically sound test error estimates - by bias-variance trade off
  - LOOCV has more variance than K-Fold cross validation due to highly correlated values drawn by training on highly overlapping training data in each iteration