Data Augmentation is now a critical component in the ML pipeline. They help the models to achievebetter performance by augmenting the original data samples with a fixed number of pre-defined functions. The intuition is that by randomly applying augmentations to the original inputs, the machine learning models would see a more diverse set of samples at training. These models will then have a better generalization (eg. a better test accuracy) when deployed.
Pytorch has a nice summarisation of possible augmentations
Illustration of transforms - Torchvision main documentation
One particularly interesting augmentation is the later proposed CutOut
Improved Regularization of Convolutional Neural Networks with Cutout
This method randomly masks out certain square regions in both inputs and activations, and served nicely as a regularization/augmentation method, this is in fact very similar to DropBlock.
DropBlock: A regularization method for convolutional networks
Cutmix later caught a lot of attentions since it proposed to mix the inputs:
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
People then have looked at how to combine a series of augmentations to build an augmentation pipeline. Both AutoAug and TrivilAug belongs to this degree.
TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation
The candidate should be experienced in Object Orientated Programming in Python. Ideally, the candidate should have experience or at least willing to learn various Machine Learning frameworks in Python (such as Pytorch and Pytorch Lightning).