What Is Data Augmentation?

Shaan Ray
Published in
3 min readNov 27, 2021


The quantity and diversity of data are important factors in the effectiveness of most machine learning models. The amount and diversity of data supplied during training heavily influence the prediction accuracy of these models.

Hidden neurons are common in deep learning models that have been trained to perform well on complex tasks. The number of trainable parameters grows in unison with the number of hidden neurons. The amount of data needed is proportional to the number of learnable parameters in the model.

Applying a range of transformations to the available data to synthesize new data is one technique to cope with the challenge of limited data. ‘Data Augmentation’ refers to the process of synthesizing new data from existing data.

Data augmentation can be utilized to address both requirements; the amount of data and the diversity of the training data needed to create an accurate machine learning model.

What it is

Data augmentation is a set of techniques used to increase the amount of data in a machine learning model by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It helps smooth out the machine learning model and reduce the overfitting of data.


Images are modified slightly and then added to the data sets used in machine learning models. Some techniques used to augment images for machine learning algorithm datasets are:

  • Geometric transformations
  • Elastic transformations
  • Flipping
  • Color modification
  • Cropping
  • Rotation
  • Translation (moving the image in the x or y direction)
  • Noise injection
  • Zoom and scaling
  • Random erasing
The original image of a Quoka on the left, with various augmented versions of the image on the right. Source: http://ai.stanford.edu/

Benefits of Data Augmentation

A machine learning model performs better and is more accurate when the dataset is rich and comprehensive. By creating fresh and varied instances to train datasets, data augmentation can help improve the performance and results of machine learning models.

Data collection and labeling can be time-consuming and costly for machine learning models. Companies can lower these operational costs by transforming datasets using data augmentation techniques.

Cleaning data is one of the phases required in creating a data model with a high accuracy level. However, if data cleaning reduces representability, the model will not make accurate predictions for real-world inputs. Machine learning models can be made more robust via data augmentation approaches, which create several variances that the model might encounter in the actual world.

Use Case: Medical Imaging

A major use case for data augmentation at the moment is medical imaging. The datasets for medical images aren’t very big, and because of regulations and privacy issues, sharing data isn’t easy. Furthermore, in the event of rare diseases, the data sets are even more limited. Medical imaging firms are using data augmentation to add diversity to their data sets.


Businesses can use data augmentation to lessen their reliance on training data preparation and develop more accurate machine learning models faster. Data augmentation can also help machine learning models with lots of data already by increasing the amount of relevant data in the dataset.

Shaan Ray

Helping clients identify and invest in Emerging Technologies early on so that they can innovate and grow exponentially. Follow Lansaar Research for the latest in emerging technologies and new business models.