Image Data Augmentation Using Keras
Hello!
The focus of this post is Image Data Augmentation. When we work with image classification projects, the input which a user will give can vary in many aspects like angles, zoom and stability while clicking the picture. So we should train our model to accept and make sense of almost all types of inputs.
This can be done by training the model for all possibilities. But we can’t go around clicking the same training picture in every possible angles and imagine that when the training set is as big as 10000 pictures!
This can be easily be solved by a technique called Image Data Augmentation, which takes an image, converts it and save it all the possible forms we specify. We will be using Keras for this, which is a deep learning library for Theano and Tensorflow.
Let’s start by installing the packages required for our model.
We have installed scipy
,numpy
,h5py
,pyyaml
because they are dependencies required for keras
and since keras works on a tensorflow
backend, there is a need to install that as well. You can read more about tensorflow installation here. We will be using keras
for performing Image Augmentation.
We start our program by importing keras image preprocessing.
Here, ImageDataGenerator
is used to specify the parameters like rotation, zoom, width we will be using to generate images, more of which will be covered later. img_to_array
is used to convert the given image to a numpy array which will be used by the ImageDataGenerator
, load_img
will be used to load the image to modify into our program.
We have used ImageDataGenerator()
here to specify the parameters for generating our image, which can be explained as follows:
rotation_range : amount of rotation
width_shift_range , height_shift_range : amount of shift in width, height
shear_range : shear angle in counter-clockwise direction as radians
zoom_range : range for random zoom
horizontal_flip : Boolean (True or False). Randomly flip inputs horizontally
fill_mode : One of {“constant”, “nearest”, “reflect” or “wrap”}. Points outside the boundaries of the input are filled according to the given mode
After specifying the parameters and storing them in datagen
variable, we move towards importing our image.
load_img
is used to load the required image, you can use any image you like but I would recommend an image with a face like that of a cat, a dog or a human!
Next, we use img_to_array
to convert the image to something numerical, in this case a numpy array, which can be easily fed into our flow()
function (don’t worry it is explained later!). We store our converted numpy array to a variable x
.
Then, we have to reshape the numpy array, adding another parameter of size 1. We do so in order to make it a numpy array of order 4 instead of order 3, to accommodate a parameter called channels axis. In case of grayscale data, the channels axis should have value 1, and in case of RGB data, it should have value 3.
For instance, I will take this image as my input (Yes, a dog!)
Now that we have our input in form, let’s start producing some output.
We run a loop for 20 times, and for each iteration we use datatgen.flow()
function. We have given x
- the numpy array for the input image, save_to_dir
- the directory to save output, save_prefix
- the prefix for the names of the images and save_format
- the image format as input.
As we have specified 20 iterations, 20 images of the dog with changes that were specified in datagen will be produced and will be stored in the folder called preview.
The output images look something like these-
Notice that each image is a bit different from the other due to zoom, rotation, width or height shift etc. This will help the model you will be building to recognise a large number of images, thus making it more efficient.
That’s all for this post, subscribe for more Machine Learning, Neural Networks and Deep Learning updates.
Goodbye!