Image Augmentation: a brief recap

5 min readNov 15, 2022

Deep learning algorithms perform well when the amount of data is large enough. Actually, is not always possible to have lot of data to feed in training algorithm for most of the analysis. The lack of data may lead to an overfitting of the training process and the manually collection of data is costly and not always possible. The word “augmentation” literally means “the action or process of making or becoming greater in size or amount”.

Image augmentation is a technique that changes the initial data to create more input data to feed the model. In other words, image augmentation artificially expands the available dataset generating new images from existing ones. This is obtained by using techniques such as random rotation, shifts, scaling, cropping, flips and so on. In image classification, augmentation is crucial to build a strong classifier and also to reduce the skewness of data available in the various classes. Moreover, image augmentation is usually required to boost the performance of Artificial Neural Networks. Image augmentation can be divided in two main categories: spatial augmentation and pixel augmentation.

Image from website: https://www.quantib.com/blog/image-augmentation-how-to-overcome-small-radiology-datasets

The most used techniques for image spatial augmentation are:

Rotation: is the most common technique for image augmentation because the images information remains the same at each different angle. The calculations of the new coordinate of each pixel in the input image can be done independently. The coordinates of a point (x_1, y_1) when rotated by an angle θ around (x_0, y_0) become (x_2, y_2):

These equations can be rewritten in the compact way:

Translation: image shifting left, right, up, or down is a geometric transformation that reallocates every object of an image after mapping their positions. So, the position of the objects in the image change and the use of this technique can result in a more generalized model. This technique can be a very useful transformation to avoid positional bias in the data. A pixel in an image located at (x_1, y_1) is shifted to a new position (x_2, y_2) by displacing it through the translation (β_x, β_y), that is randomic:

Flipping: in both left-right direction and up-down direction. This can’t be used for images containing some form of alphanumeric text because it will create an unreal case. The reflaction of a pixel in an image can be made multiplying a reflection matrix R by the initial coordinates:

The reflection matrixes R in the x-axis, y-axis, in the origin and in the line y = x are respectively:

Cropping: randomly cropping a central patch of each image provides an effect very similar to translations, but cropping reduces the size of the input. Cropping images can be used as a practical processing step for image data with mixed height and width dimensions.
Noise injection: is the creation of new images adding noises, usually from Gaussian distribution, to the initial image in order to build a noise-robust classifier.

These geometric transformations are easy to implement and are able to solve the problem of positional biases in the training data. For example positional biases are present in a facial recognition dataset, where every face is perfectly centered in the frame. The disadvantages of geometric transformations are additional memory required, transformation compute costs, and additional training time.

Another technique used for image augmentation is “pixel augmentation” that involves brightness, contrast, saturation and hue random changes in order to avoid lighting biases. An example of pixel transformation consists of restricting pixel values to a certain minimum or maximum value. The disadvantages of color space transformations are the same cited before for geometrical transformations. Additionally, color transformations may discard important color information, for example, after a pixel values change, objects in the image may become impossible to recognize.

To sum up, data augmentation manipulates images and creates different versions of similar content in order to expose the model to a wider array of training examples. Note that image augmentation is used only for the training process, while other preprocessing techniques, like filtering, are applied both at training and testing set. Data augmentation is an important research topic and is still under development. It can concern important task such as cancer type classification that is characterized by a serious lack of data. Jason Wang and Luis Perez, two researchers from Stanford University, have implemented a method called “neural network augmentation” showing how a neural network can learn augmentation that best improves the classi- fier minimizing loss. “Augmenting data via a neural net is achieved by concatenating two images of the same class to create an input of 6 channels deep (2 if gray scale)”, explain the two researchers in their paper. Thanks to this method, a dataset of size N can create N^2 pairs, which results into a magnitude increase. Moreover, this type of data augmentation can be combined to others data augmentation techniques.

Bibliography

Luis Perez and Jason Wang. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.
Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.

Image Augmentation: a brief recap

Bibliography

Written by Salvi Elisa