Data Augmentation, Regularization, and Res-Nets

Transformations to the Dataset :

Use test set for validation:
- Instead of setting aside a fraction (e.g. 10%) of the data from the training set for validation, we'll simply use the test set as our validation set.
- This just gives us a little more data to train with.
- In general, once you have picked the best model architecture & hyperparameters using a fixed validation set, it is a good idea to retrain the same model on the entire dataset just to give it a small final boost in performance.
Channel-wise data normalization:
- We will normalize the image tensors by subtracting the mean and dividing by the standard deviation across each channel.
- As a result, the mean of the data across each channel is 0, and standard deviation is 1.
- Normalizing the data prevents the values from any one channel from disproportionately affecting the losses and gradients while training, simply by having a higher or wider range of values than others.
Randomized data augmentations:
- We will apply randomly chosen transformations while loading images from the training dataset.
- Specifically, we will pad each image by 4 pixels, and then take a random crop of size 32 x 32 pixels, and then flip the image horizontally with a 50% probability.
- Since the transformation will be applied randomly and dynamically each time a particular image is loaded, the model sees slightly different images in each epoch of training, which allows it to generalize better.
Above transformations are applied to the training dataset and not the validation dataset. Validation dataset only takes channel normalization as the model only interprets normalized values.
tt.compose([transformation 1, ..., transformation n]) - To stack transformations together.

stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_tfms = tt.Compose([tt.RandomCrop(32, padding=4, padding_mode='reflect'), 
                         tt.RandomHorizontalFlip(), 
                         # tt.RandomRotate,
												 # tt.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)), 
												 # tt.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
                         tt.ToTensor(), 
                         tt.Normalize(*stats,inplace=True)])
valid_tfms = tt.Compose([tt.ToTensor(), tt.Normalize(*stats)])

Stats hold the mean and S.D. for RGB
tt.randomcrop(size, padding, paddingmode) - adds padding, reflects the images w.r.t. padding and crops the image.
tt.RandomHorizontalFlip() - Randomly flips the image.
tt.normalize(*stats) - Subtracts mean from channel values and divides by S.D.

Residual Block :

Adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.
Without a residual block, the conv layers are responsible for transforming input to outputs. Eg. Images to class probabilities.
With residual layer, the conv layers are no longer responsible for transforming input to outputs, instead, they are responsible for only finding the difference between inputs and outputs.
We cannot change output channels in a residual block, because then the input and output shape would change making it impossible to do the addition of both.
After each convolutional layer, we'll add a batch normalization layer, which normalizes the outputs of the previous layer.