Analytics Vidhya
Published in

Analytics Vidhya

Data Augmentation to solve imbalanced training data for Image Classification

This article will walk you through how one can use Data Augmentation to solve the problem of having imbalanced image classification data. Having imbalanced training data can lead to bias in the classifier, in scenarios where it’s not feasible to get more training data for under represented classes, Data Augmentation can be used to increase the size of training data.

In this article, I go over a few techniques that can be used to augment training data for imbalanced classes.

First, let’s read the actual image.

We can perform several transformation on the original image to augment our training data, such as,

  1. Flipping the image horizontally

2. Rotating the image by a random degree

3. Adding random noise to the Image

4. Cropping the image

Keras also provides a simple and effective method that can be used for Data Augmentation (Link) via the keras.preprocessing.image.ImageDataGenerator class. This class allows you to:

  • configure random transformations and normalisation operations to be done on your image data during training
  • instantiate generators of augmented image batches (and their labels) via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with the Keras model methods that accept data generators as inputs, fit_generator, evaluate_generator and predict_generator.

Using Data Augmentation we can quickly increase the amount of data for our imbalanced classes, this will ensure that our model does not see the same image twice and helps avoid overfitting and aids the model to generalise better.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store