Data Augmentation tips

3 min readJan 7, 2023

Data augmentation is a technique used to artificially increase the size of a dataset by generating new data samples from the existing ones. This is done by applying various transformations to the data, such as rotation, scaling, cropping, and flipping.

Data augmentation is often used in machine learning to improve the generalization performance of a model. By generating additional training data, a model can be trained to be more robust and better able to handle variations in the data it sees at test time. Data augmentation is particularly useful when the amount of available training data is small, as it can help prevent overfitting.

Data augmentation is also commonly used in computer vision, where it is used to generate additional training data for image classification, object detection, and other tasks. For example, by rotating an image of a dog by a few degrees and adding it to the training set, the model can learn to recognize the same dog at different orientations.

Examples of techniques:

For all examples we will show augmentations of the image below. Note all images are taken from https://iq.opengenus.org/data-augmentation/

Another idea: General Adversarial Networks (GAN)

Generative Adversarial Networks (GANs) can be used for data augmentation by generating synthetic data that is similar to the real data. This synthetic data can be used to augment the training set, allowing a machine learning model to be trained on a larger and more diverse dataset.

To use GANs for data augmentation, a GAN model is trained on the real data. The generator network is then used to generate synthetic data, which is added to the training set along with the real data. The model can then be retrained on the augmented training set.

One of the main benefits of using GANs for data augmentation is that the synthetic data is generated based on the real data, so it is more similar to the real data than data that is generated randomly. This can help the model to learn more effectively and improve its generalization performance.

GANs have been used for data augmentation in a variety of applications, including image classification, object detection, and natural language processing. They can be particularly useful in situations where the amount of real data is limited and there is a need to augment the training set to improve model performance.

Example of GANS in action. Note that the input and output images may look the same, however they have completely different pixel values (Looks different to a computer but not a human)

Another idea: Neural style transfer

Neural style transfer is a machine learning technique that uses a convolutional neural network (CNN) to transfer the artistic style of one image onto the content of another image. This results in a new image that has the content of the first image and the style of the second image.

By generating a large number of new images using this technique, with the content taken from real images and the style from a reference image or set of reference images, it is possible to augment a training dataset for a machine learning model. The synthetic images produced in this way can be added to the training set, allowing the model to be trained on a larger and more diverse dataset.

Data Augmentation tips

Examples of techniques:

Another idea: General Adversarial Networks (GAN)

Another idea: Neural style transfer

Written by Aaron Brennan