Transformations to the Dataset :
Use test set for validation:
Channel-wise data normalization:
We will normalize the image tensors by subtracting the mean and dividing by the standard deviation across each channel.
As a result, the mean of the data across each channel is 0, and standard deviation is 1.
Normalizing the data prevents the values from any one channel from disproportionately affecting the losses and gradients while training, simply by having a higher or wider range of values than others.

Randomized data augmentations:
Above transformations are applied to the training dataset and not the validation dataset. Validation dataset only takes channel normalization as the model only interprets normalized values.
tt.compose([transformation 1, ..., transformation n]) - To stack transformations together.
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_tfms = tt.Compose([tt.RandomCrop(32, padding=4, padding_mode='reflect'),
tt.RandomHorizontalFlip(),
# tt.RandomRotate,
# tt.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)),
# tt.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
tt.ToTensor(),
tt.Normalize(*stats,inplace=True)])
valid_tfms = tt.Compose([tt.ToTensor(), tt.Normalize(*stats)])
tt.randomcrop(size, padding, paddingmode) - adds padding, reflects the images w.r.t. padding and crops the image.tt.RandomHorizontalFlip() - Randomly flips the image.tt.normalize(*stats) - Subtracts mean from channel values and divides by S.D.Residual Block :
Adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.

Without a residual block, the conv layers are responsible for transforming input to outputs. Eg. Images to class probabilities.
With residual layer, the conv layers are no longer responsible for transforming input to outputs, instead, they are responsible for only finding the difference between inputs and outputs.
We cannot change output channels in a residual block, because then the input and output shape would change making it impossible to do the addition of both.
After each convolutional layer, we'll add a batch normalization layer, which normalizes the outputs of the previous layer.