Image Augmentation Techniques

Jyotsana
9 min readApr 25, 2023

--

Table of Contents

  1. Data Augmentation
    - Introduction
    - Challenges in CV Problems
  2. Classical techniques for image augmentation
  3. Advance techniques
    - Cutout
    - Mixup
    - Cutmix
    - Augmix
  4. Summary
  5. References

Data Augmentation

Introduction

Deep learning has been doing some amazing things in the field of computer vision. But there’s a catch: it needs loads of images to work its magic! And, well, collecting a ton of images is not exactly a walk in the park. But don’t worry, there are some cool image augmentation techniques that can help us out. Understanding these techniques is super important if you want to come up with new and creative ways to improve your computer vision tasks. So, let’s dive in!

Different Model Architecture and Augmentation technique used with them.

Challenges in Computer Vision

  1. Image variations: Images can vary in many ways, such as lighting, pose, scale and occlusion. These variations can make it difficult for computer vision models to generalise to new data. For example, in fig 1. a model that has only seen images of cats in one pose or lighting condition may struggle to recognise cats in different poses or lighting conditions.
  2. Class imbalance and few images: In many object detection and classification tasks, the number of images in each class is not balanced, and some classes may have only a few examples. This can make it difficult for models to learn to recognise all classes equally well. In medical imaging, abnormal cases often occur with a low probability, which is further exacerbated by privacy.
  3. Domain shift: A model trained on one dataset or in one environment may not perform well when applied to new, unseen data or environments. This is because the distribution of the data may be different in the new environment, leading to a domain shift. For instance, in the domain of autonomous driving, where it is easier to capture images during the day, one may want to train their model using a daytime dataset but later evaluate it in nighttime conditions.
  4. Data remembering (Overfitting): A larger set of learnable parameters in a deep learning model demands more data for training and if the number of parameters increases, the model may overfit by memorizing specific data points, especially when there is insufficient data for training. This can lead to overfitting and poor performance on new data. To avoid data remembering, it is important to use techniques such as regularization and data augmentation to expose the model to a wider variety of data during training.
Fig 1. Common variations of images in Computer vision from CS231n

Classical Techniques for Image Augmentation

Classical image augmentation techniques are still relevant and widely used in computer vision because they provide a simple yet effective way to increase the size and diversity of the training dataset. Moreover, these techniques are easy to implement and computationally efficient, making them suitable for large-scale datasets and real-time applications. Let us examine where each technique should be applied.

  1. Flipping and rotating: Robust to changes in object orientation.
  2. Cropping and resizing: To handle changes in object scale or position within the image.
  3. Color Jittering: To handle changes in lighting conditions and color variations in the dataset.
  4. Adding noise: Beneficial to handle variations in image quality and simulating noisy environments.
  5. Image warping: Helpful for increasing a model’s ability to handle changes in object pose and simulating changes in camera viewpoint.
  6. Random erasing: Useful to make it more robust to occlusions or clutter in the image, or when you want to simulate missing data.
Use Cases for Image Augmentation Techniques

Advance Techniques

Image Source: AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Cutout

Cutout is an augmentation technique that randomly covers a region of an input image with a square.

Cutout Illustration

Explaination

Co-adaptation in neural networks refers to a situation where some neurons become highly dependent on others, and if those independent neurons receive “bad” inputs, it can affect the dependent neurons as well, leading to a significant change in model performance, which is a concern in overfitting. Dropout is one such technique to solve this issue.
The authors of CutOut Paper tried to apply similar method on images for CNNs where units are dropped out at the input layer of a CNN, rather than in the intermediate feature layers and contiguous sections of inputs (patches of region in image) were dropped rather than individual pixels or neurons. This is because nearby pixels in image contain mostly similar information.
For example, when identifying bird species in an image- By using the “cutout” augmentation technique, the model can better detect small features such as beak shape, feather patterns, and eye color, which may be important for accurate classification.

Advantages

  1. Helps in training models to recognize partial or occuluded objects.
  2. Allows the model to consider more of the image context such as minor features rather than relying heavily on major features before making a decision.

Limitations

  1. It can completely remove important features from an image.
  2. It may not work well for images with complex backgrounds, and it may not be effective for small-sized patches or small datasets.
  3. One limitation of removing a square region from an image and filling it with black, grey, or Gaussian noise pixels is that it significantly reduces the proportion of informative pixels used in the training process, which can be problematic for CNNs that require large amounts of data to learn effectively.

Hyperparameter

Size and number of patches to be cut out from the image.

Mixup Augmentation

Mixup generates a weighted combination of random image pairs from the training data. Given two images and their ground truth labels: (xi,yi),(xj,yj), a synthetic training example (x,y) is generated as:

Note: the lambda values are values with the [0, 1] range and are sampled from the Beta distribution.

Mixup Illustration

Explaination

In a nutshell, mixup constructs virtual training examples. It extends the training distribution by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets.

Advantages

  1. Neural networks are prone to memorizing corrupt labels. Mixup relaxes this by combining different features with one another (same happens for the labels too) so that a network does not get overconfident about the relationship between the features and their labels.
  2. It makes decision boundaries transit linearly from class to class, providing a smoother estimate of uncertainty.
  3. Robustness to adversarial examples and stabilized GAN (Generative Adversarial Networks) training.
  4. Mixup is a domain-agnostic data augmentation technique it can be extended to a variety of data modalities such as computer vision, natural language processing, speech, and so on.

Limitations

  1. Only inter-class mixup. For intra class mixup, interpolating only between inputs with equal label did not lead to the performance gains of mixup.
  2. The examples are not real representation of class. (unnatural may confuse the model)
  3. Mixup does not work well when you are using Supervised Contrastive Learning since it expects the true labels during its pre-training phase.
  4. Label smoothing and mixup usually do not work well together because label smoothing already modifies the hard labels by some factor.

Cutmix

With Cutmix, a square region of an input image is replaced with a patch of similar dimensions from another image, which leads to a more natural-looking output.
The ground truth labels of the resulting images are mixed proportionally to the number of pixels from each image, creating a linearly interpolated label that reflects the contribution of both original images.
By avoiding uninformative pixels during training, Cutmix makes the training process more efficient while still taking advantage of regional dropout in the input space.
The virtual examples thus generated is given by,

where M is a binary mask(often square) indicating the Cutout and the fill-in regions from the two randomly drawn images. Just like mixup, λ is drawn from Beta(α,α) distribution and λ∈ [0, 1].

After the images are randomly selected, bounding box coordinates are sampled such that B = (rx,ry,rw,rh) indicates the Cutout and fill-in regions in both the images. The bounding box sampling is given by,

where rx,ry​ are randomly drawn from a uniform distribution with upper bound as shown.

Cutmix Illustration

Augmix

Mixing augmentations allows us to generate diverse transformations, which are important for inducing robustness, as a common failure mode of deep models in the arena of corruption robustness is the memorization of fixed augmentations. Previous methods have attempted to increase diversity by directly composing augmentation primitives in a chain, but this can cause the image to quickly degrade and drift off the data manifold.
Such image degradation can be mitigated and the augmentation diversity can be maintained by mixing together the results of several augmentation chains in convex combinations.

Unrealistic image by applying successive compositions

To generate augmented training images, AugMix uses three separate chains of one to three randomly chosen augmentation operations such as translation, shear, or contrast, each with a randomly assigned intensity. These chains are then combined with the original image using different weights, resulting in a single augmented image.

This approach creates multiple sources of randomness, including the selection of operations, their intensity, the length of the chains, and the mixing weights, all of which contribute to a diverse set of augmented images.

Image Source: AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

It is then combined with a consistency loss, to encourage the model to make consistent predictions for all versions of the same image.

Jensen Shannon Consistency Loss

In AugMix, the Jensen-Shannon consistency (JSC) loss is used to encourage the augmented images to be closer to the original image in terms of their probability distribution.
The JSC loss measures the similarity between the probability distributions of the original and augmented images, which is calculated as the average of the Jensen-Shannon divergence (JSD) between the distribution of each augmented image and the original image. The JSD is a symmetric and smoothed version of the Kullback-Leibler (KL) divergence, which is commonly used to measure the difference between two probability distributions.
By minimizing the JSC loss, AugMix encourages the augmented images to have similar probability distributions to the original image, which helps to ensure that the augmented images are semantically meaningful and informative for the model.
Mathematically, Jensen-Shannon divergence (JS) is added to the original cross entropy loss (L), weighted by the lambda hyperparameter.

The Jensen-Shannon divergence (JS) is computed by first obtaining the average prediction probability:

Then, the average KL divergence between each image version and the average prediction probability is computed.

where, augmix1 and augmix2 are two augmented images of same orig image.

Hyperparameters

number of augmented versions
alpha= weight of JSC in Loss

Summary

In conclusion, data augmentation is a powerful technique that can significantly improve the performance of image classification and object detection models. By increasing the variability of the training data and reducing overfitting, augmentations such as random cropping, flipping, and color jittering can help to make the model more robust to variations in the input images. Additionally, more advanced augmentations such as CutMix, MixUp, and AugMix can generate more diverse and realistic training data, and encourage the model to learn more robust features. By carefully selecting and applying the right augmentations for each architecture and task, it is possible to achieve state-of-the-art performance on a wide range of computer vision tasks.

References

  1. Improved Regularization of Convolutional Neural Networks with Cutout
  2. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
  3. mixup: BEYOND EMPIRICAL RISK MINIMIZATION
  4. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
  5. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
  6. https://keras.io/examples/vision/cutmix/
  7. https://albumentations.ai

Hope you enjoyed this article. You can connect with me on Linkedin.

--

--

Jyotsana

Senior Data Scientist | Computer Vision, Recommendation System, NLP problems | Ecommerce