From 50 to 5000, an Image Augmentation Story

Tarushi Pathak
DataX Journal
Published in
2 min readJul 23, 2020

Hi! I am sure you all must have encountered a moment when you needed more images for your dataset but there were only limited resources and someone must have recommended using the Image Data Generator. While using the Image Data Generator you must have realized that it identifies images in two or more separate folders. Now if you are doing Image Classification, this is a lifesaver. But if you just need more data for doing something else, like feeding images to a GAN, this is not what you need.

I was working on a research project related to GANs when I came across this problem statement. I went through multiple articles and finally found the one I needed. Below explained is the code I used to increase my dataset from 50 to 5000.

1.Getting the Images

def get_images_path(PATH):
image_path=os.listdir(PATH)
return image_path
path=r'/home/tarushi/Desktop/images'
img_path=get_images_path(path)

First, create a function to get the list of all the images you currently have saved in your local drive. You would only get the name of the images stored in the folder, so you will have to edit the path to those images.

img_path_final=[]
for i in img_path:
img_path_final.append(path+'/'+i)
img_path_final[:2]

This will show the entire path to the image. For example, if the stored images in the folder are cat001.jpg and dog002.jpg then it will display:

/home/tarushi/Desktop/images/cat001.jpg

Now, these can be accessed easily.

2.Creating New Images and Saving them

datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest')

Next, we define the ImageDataGenerator and fill in the parameters for the kind of operations we want it to perform.

def new_img(img,new_dir):
z=0
for batch in datagen.flow(img, batch_size=1,save_to_dir=new_dir, save_prefix= 'microstructure', save_format='jpeg'):
z+=1
if(z==100):
break

This is the function that makes it possible to make n number of images for a particular image. In this case, we are making 100 images through datagen for each image. We have to provide it with the directory/path to the folder where we want the image to be stored.

def create_new_images(img_path,new_dir):
for i in img_path:
img = load_img(i) # this is a PIL image
x = img_to_array(img) # this is converts image to an array
x= x.reshape((1,)+x.shape)
new_img(x,new_dir)
return(len(os.listdir(new_dir)))
new_folder_path=r'/home/tarushi/Desktop/new_images
create_new_images(img_path_final,new_folder_path)

This function takes in the path to the images & the path to the folder where you intend to save your image, calls the new_img function to create new images, and stores them in the new directory you provided. It returns the number of images created.

So, that’s how you can make n number of images, if you are given very few images. Also, a word of warning, these images will have lining and white spaces because of the operations they undergo.

You can get the code here.

Leave some claps if you learned something new!

--

--