The success of data science lies in the fact that we use huge amount of data. What would happen when we have lesser amount of data?
The model is not best trained and the whole purpose of the exercise of training predicting might not be best achieved.
What do we do? — Get more data!!!
But how? Create more data!!
Ok !! Point taken.
Now I managed to get more and even more data. But still the model does not seem work best when we have put it in real world. It was performing very well when I tested.
Welcome to the phenomenon of overfitting.
The model when trained on the same set of data over and over again starts getting biased towards the features present in the training and validation set. Oops!!!
So now? — Let your model see/ taste newer data.
What do we do? — Get more data!!!
But how? Create more data!!
That’s easier said than done. Imagine we are working on brain MRI data. Would we be able to create own data!! Dare not!!
However there are proven and recommended methods by which we can create our own data. It’s called Data Augmentation.
Some basic Guidelines for Selecting Augmentation Technique
While we look into how the method works, some points to remember
- Make sure that the all the image sizes should be of same size and match as required by the model. Resize if required.
- Be mindful if the location of the image matters. If location of the image is of relevance — this technique would not the best approach.
- Normalizing image pixel values is appreciated for better and quicker convergence of models. Pre-process the images if need be.
Image Data Augmentation techniques
Below are some of the popular transformations we can use.
- Color space
- Noise injection
- Color space transformations
Kernel filter is a very popular technique used to sharpen / blur images. It works by sliding an n × n matrix across an image with either a Gaussian blur filter, which results in a blurrier image, or a high contrast vertical or horizontal edge filter giving a sharper image along edges. Sharpening images provides more details about objects of interest while blurring would be a good test data for motion blur images. Sharpening and blurring are some of the classical ways of applying kernel filters to images. (ref : PatchShuffle Regularization). This however is quite similar to CNN with padding.
Mixing images together by averaging their pixel values is a very counterintuitive approach to Data Augmentation. The images produced by doing this will not look like a useful transformation to a human observer. However there are numerous experimental done to prove the usefulness of such an approach .Experiments show reduced error rates.
Random erasing is another interesting technique.
It forces the model to learn more descriptive features about an image, thereby avoiding overfitting to a particular certain visual feature of the image. It makes sure that the entire image is used to learn from the entire features of the image.
Random erasing works by randomly selecting an n × m patch of an image and masking it with 0 s, 255 s, mean pixel values, or random values
It directly prevents overfitting by altering the input space. By removing certain patches of the image, it forces the model look for more features. Interestingly random erasing can be clubbed with other data augmentation technique to create new data space.
However on the flip side, random erasing might make things difficult at times. Imagine random erasing off the top portion on a handwritten number 8, would be difficult to decipher between 6 or 8. Such cases might need some manual intervention.
Is it all this rosy!!
Nah! It cannot be — life is never that simple. Applying all these geometric transformations, color space transformations, kernel filters, mixing images, and random erasing along with combining augmentations may result in enormous inflated dataset. This may again lead to overfitting. The word of advice; think properly before implementation.
Having said all, consider this …………. .
We have now augmented the data size for reasons understandable but what about the computational cost and its impact on the speed of the model. And eventually all the hard work you did could just add to be a bottleneck in real time prediction. However there are numerous research work to suggest that test time augmentation has in fact improved prediction / classification.
Also be mindful of how to use the new augmented dataset that we developed. There is no prescribed strategy however experimentally this has been seen that it’s best to initially train your model with the original data and only then run some further training with the original and augmented data. A recommended approach is to plot your training accuracy over time across different initial training subsets. This might help to find out patterns in the data. Data Augmentation creates massive set of data and it has been seen that training your model on such set makes the model faster and accurate.
Now consider this and am sure this is going to be interesting. Have you ever considered the impact all these augmentation have on the resolution of your image? A HD (1920 × 1080 × 3) or 4 K (3840 × 2160 × 3) require much more processing and memory to train deep CNNs. The models that we normally deal with, down-sample images from their original resolution. However down-sampling causes loss of information within the image, making image recognition more difficult, well we can still make out Roger Federer.
Resolution is also a very important topic with GANs however producing high resolution outputs from GANs is not an easy task due to training stability and mode collapse.
Final dataset size
An important aspect to consider in the process of Data Augmentation is the final dataset size. When all images are horizontally flipped and added to the dataset, the resulting dataset size changes from N to 2N. And thereby the additional memory and compute constraints associated with augmenting data. So what is to be considered would we transform the data on the fly during training or keep the data transformed beforehand and store it in memory! While the first approach can save memory it would slow down training considerably. Storing the dataset in memory is also not easy and depends on how big the dataset has inflated to. This is also classified as online and offline data augmentation. Remember curriculum learning, we can apply the concept here.
Class imbalance can be a problem when the dataset is composed of examples from one class. This could manifest itself in a binary classification with a clear majority-minority class distinction, or in multi-class classification with one or multiple majority classes or otherwise. Imbalanced datasets results in bias and renders poor performance metric.
As a solution we can use a simple random oversampling with small geometric transformations. Other image augmentations techniques such as color augmentations, mixing images, kernel filters, and random erasing can also be used to oversample data in a similar fashion. This can be useful for ease of implementation and quick experimentation with different class ratios. One problem of oversampling with basic image transformations is that it may result in overfitting on the minority class which is being oversampled. The biases present in the minority class are more prevalent post-sampling with these techniques.
Various oversampling methods e.g. adversarial training, Neural Style Transfer, GANs, and meta-learning schemes can also be use. Neural Style Transfer is an interesting way to create new images. These new images can be created either through extrapolating style with a foreign style or by interpolating styles amongst instances within the dataset. GANs can also be used to oversample.
Data Augmentation is a very useful technique of constructing robust dataset and to avoid the problem of overfitting in Deep Learning models due to limited data. Many augmentations have been proposed. These can be classified under data warping or oversampling technique.
Data Augmentation may help to overcome biases in a small dataset. For example, in a dog breed classification task, if there are only hounds and no instances of retrievers, no augmentation method discussed, would be able to create retrievers. Data Augmentation prevents overfitting by modifying limited datasets to behave like big datasets. The future of Data Augmentation is bright. The use of algorithms combining data warping and oversampling methods has enormous potential in days to come.