Data Augmentation on Images

Connor Shorten
TDS Archive
Published in
3 min readAug 31, 2018
Do you think this monkey could do a similar job of recognizing his own image from portrait frames such as in the mirror, or from the right-turn angle we are viewing the monkey in?

One of the best ways to improve the performance of a Deep Learning model is to add more data to the training set. Aside from gathering more instances from the wild that are representative of the distinction task, we want to develop a set of methods that enhance the data we already have. There are many ways to augment existing datasets and produce more robust models. In the image domain, these are done to utilize the full power of the convolutional neural network, which is able to capture translational invariance. This translational invariance is what makes image recognition such a difficult task in the first place. You want the dataset to be representative of the many different positions, angles, lightings, and miscellaneous distortions that are of interest to the vision task.

For example, if every image in your training set is perfectly centered within the frame, traditional feed forward models would be confused when the image is slightly shifted to the right relative to the background. If every picture of a cat is taken in portrait made, the model will not recognize the cat when it’s face is turned to the right.

To improve the model’s ability to generalize and correctly label images with some sort of distortion, we apply several translations to our dataset. This can be done in many ways, in this article we will focus primarily on horizontal and vertical flips, translations, color distortions, and adding random noise.

The relevance of horizontal and vertical invariance is very intuitive.

The mirror images above each carry the same characteristics of a cat that we would want an image classifier to learn. Adding these left/right flipped images will increase the robustness of an image classification model. Especially when involving tasks where the perspective of the cat is unknown, (Imagine a model being placed inside of a video surveillance camera counting stray cats in an area).

Additionally, we might want to add translational shifts to images in our dataset.

Centered image vs. Left-Aligned Image

In the example above, we see a perfectly centered image compared to a left-aligned image. Doing these transformations on our images will force the model to focus on the curves and characteristics of a ‘3’, rather than some implicit features relating to the position in the frame.

In addition to these translations that massively change the values in the 2-Dimensional pixel space, we can add random noise to the images. This could help combat against color space distortions such as lighting.

Add noise to image
noise = np.random.random((IMG_SIZE, IMG_SIZE), np.uint8)
Matrix addition.. add noise to the image
image += noise

These simple image transformations will result in a more robust model that learns better characteristics for making distinctions between images. These techniques are great for improving the generalization of models as well as learning with small amounts of data. Overfitting is one of the primary problems with machine learning models in general. These transformations are great for combatting overfitting, however, they will not work if the underlying dataset is already not representative of the sample space.

I hope these transformations can improve your image recognition models, Thanks for reading!

CShorten

Connor Shorten is a Computer Science student at Florida Atlantic University. Research interests in software economics, deep learning, and software engineering.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (2)