With the current state of deep learning, the name of the game is all about amassing large quantities of data to train a high capacity model like BERT or ResNet. However, as machine learning practitioners, we may not always have a huge dataset to leverage. Sometimes we are stuck with a dataset on the order of only a couple hundred or thousands of data points. Without a large amount of data, these high capacity state-of-the-art models are prone to overfitting, and may not be able to learn the primitives of a particular domain, for example detecting edges and contours in…