Image Data Augmentation Using Keras


The focus of this post is Image Data Augmentation. When we work with image classification projects, the input which a user will give can vary in many aspects like angles, zoom and stability while clicking the picture. So we should train our model to accept and make sense of almost all types of inputs.

This can be done by training the model for all possibilities. But we can’t go around clicking the same training picture in every possible angles and imagine that when the training set is as big as 10000 pictures!

This can be easily be solved by a technique called Image Data Augmentation, which takes an image, converts it and save it all the possible forms we specify. We will be using Keras for this, which is a deep learning library for Theano and Tensorflow.

Let’s start by installing the packages required for our model.

Installing Required Packages

We have installed scipy ,numpy ,h5py ,pyyaml because they are dependencies required for keras and since keras works on a tensorflow backend, there is a need to install that as well. You can read more about tensorflow installation here. We will be using keras for performing Image Augmentation.

We start our program by importing keras image preprocessing.

Image Preprocessing Keras

Here, ImageDataGenerator is used to specify the parameters like rotation, zoom, width we will be using to generate images, more of which will be covered later. img_to_array is used to convert the given image to a numpy array which will be used by the ImageDataGenerator, load_img will be used to load the image to modify into our program.

Data Augmentation

We have used ImageDataGenerator() here to specify the parameters for generating our image, which can be explained as follows:

rotation_range : amount of rotation

width_shift_range , height_shift_range : amount of shift in width, height

shear_range : shear angle in counter-clockwise direction as radians

zoom_range : range for random zoom

horizontal_flip : Boolean (True or False). Randomly flip inputs horizontally

fill_mode : One of {“constant”, “nearest”, “reflect” or “wrap”}. Points outside the boundaries of the input are filled according to the given mode

After specifying the parameters and storing them in datagen variable, we move towards importing our image.

Importing Image

load_img is used to load the required image, you can use any image you like but I would recommend an image with a face like that of a cat, a dog or a human!

Next, we use img_to_array to convert the image to something numerical, in this case a numpy array, which can be easily fed into our flow() function (don’t worry it is explained later!). We store our converted numpy array to a variable x.

Then, we have to reshape the numpy array, adding another parameter of size 1. We do so in order to make it a numpy array of order 4 instead of order 3, to accommodate a parameter called channels axis. In case of grayscale data, the channels axis should have value 1, and in case of RGB data, it should have value 3.

For instance, I will take this image as my input (Yes, a dog!)

Image Input

Now that we have our input in form, let’s start producing some output.

We run a loop for 20 times, and for each iteration we use datatgen.flow() function. We have given x- the numpy array for the input image, save_to_dir- the directory to save output, save_prefix- the prefix for the names of the images and save_format- the image format as input.

As we have specified 20 iterations, 20 images of the dog with changes that were specified in datagen will be produced and will be stored in the folder called preview.

The output images look something like these-

Output Images

Notice that each image is a bit different from the other due to zoom, rotation, width or height shift etc. This will help the model you will be building to recognise a large number of images, thus making it more efficient.

That’s all for this post, subscribe for more Machine Learning, Neural Networks and Deep Learning updates.