Creating custom data generator for training Deep Learning Models-Part 3

Anuj shah (Exploring Neurons)
8 min readNov 25, 2019

--

Chapter -1 : What is a generator function in python and the difference between yield and return

Chapter-2: Writing a generator function to read your data that can be fed for training an image classifier in Keras.

Chapter-3: Writing generator function for different kinds of inputs — multiple input or sequence of input.

In the previous post, we started writing a data generator for training deep learning models. Albeit, we used this data generator for training model in Keras, you can use the same with other libraries as well. For this tutorial we are considering 3 different scenarios of training as shown below:

Scenario-1: when a single input is to be fed to network
Scenario-2: when a sequence of input (e.g, video) is to be fed to the network
Scenario-3: When you have multiple inputs that need to be fed to an ensemble of model

In the previous post, we discussed how to write a data generator for the first scenario and if you haven’t been through the previous post, I would suggest that you go through that first — Chapter-2: Writing a generator function to read your data that can be fed for training an image classifier in Keras.

In this post, we will write a data generator for the second and third scenarios. Let’s consider the second scenario first, we need to train a model which takes a sequence input. To elucidate this, I am going to consider the problem of activity recognition from videos using 3D CNN which takes volume data (besides the spatial 2D data of the image, the third dimension is the number of frames).

Preparing the dataset

I will be using 3 categories — Archery, Basketball, and Biking from UCF-101 dataset. Let’s prepare the dataset for making a clean data generator for this dataset

  1. I will divide the dataset into train and test by approximately keeping 80% of the videos in each category in the train section and the remaining 20% in the test section.
  2. The next step is to convert the video frames and save them as image frames. this will help us to load the data frame by frame. And video data sets are usually saved in this format. we will create a new directory called ‘activity_data’ where all the videos will be saved as frames. This will be our working directory from here

Before step 2 the structure was

activity_recognition
UCF-101
train
Archery
v_Archery_g01_c01.avi
v_Archery_g01_c02.avi
............
Basketball
v_Basketball_g01_c01.avi
.........
Biking
v_Biking_g01_c01.avi
............
test
........
........

After step the structure becomes

activity_recognition
UCF-101
......
activity_data
train
Archery
v_Archery_g01_c01
img_000000.png
img_000001.png
.........
v_Archery_g01_c02
img_000000.png
img_000001.png
.........
............
Basketball
v_Basketball_g01_c01
img_000000.png
img_000001.png
.........
..........
Biking
v_Biking_g01_c01
img_000000.png
img_000001.png
.........
..........
test
........
........

The code to do this is pretty straightforward

root_dir = ‘UCF-101’      # source dir
dest_dir = ‘activity_data’ # destination directory
# create the destination dir if it does not exists
if not os.path.exists(dest_dir):
os.mkdir(dest_dir)
# To list what are the directories — train, test
data_dir_list = os.listdir(root_dir)
for data_dir in data_dir_list: # loop over train and test dir
data_path = os.path.join(root_dir,data_dir)
#Read all the folders in each of the train and test dir
dest_data_path = os.path.join(dest_dir,data_dir)
if not os.path.exists(dest_data_path):
os.mkdir(dest_data_path)
# for each of the activity - Archery, Basketball and Biking #read all the videos in the activity list
activity_list = os.listdir(data_path)

# for each of the video in video list, read the video and save #every frame
for activity in activity_list:
activity_path = os.path.join(data_path,activity)
dest_activity_path = os.path.join(dest_data_path,activity)
if not os.path.exists(dest_activity_path):
os.mkdir(dest_activity_path)
write_frames(activity_path,dest_activity_path)

The full code at —

3. The next step is to write the paths and labels of each frame in a CSV file as shown in the figure below-

saving the data in CSV file in the above format

Assign labels to each class as shown in the dictionary labels_name. Then create a train and test directory to store the respective files.

For writing the CSV files for each video in the train directory — you need to loop over all the activity folders (Archery, Basketball, and Biking) in the train directory, for each activity loop over every video, and for every video you create a data frame in which you will each of the frames of the video. The code snippet is shown below

Now we are done with preparing our data set and can proceed towards creating the data generator. Now our working directory has 3 folders — UCF-101, activity_data, and data_files. We are going to use data_files to read the path and activity_data to read the data.

activity_recognition
UCF-101
activity_data
train
test
data_files
train
test

Writing the Data Generator

For spatial dimension, we can mention its’s spatial size(say 224 x 224). We also need to incorporate the temporal dimension i.e. the number of frames you wish to feed once. One more term that we need to understand is the temporal stride. Let’s understand this with an example- say we have a video with 50 frames.

say you chose a temporal dimension of 16 and a temporal stride of 1.

How many data samples can be prepared from this video? The first sample will be from frame 1 to frame 16, the next from frame 2 to frame 17, and the next from 3 to 18. the formula is

total_images = 50 ; temporal_length = 16 ; temporal_stride = 1
num_samples = int((total_images-temporal_length)/temporal_stride)+1
num_samples = int((50-16)/1) + 1 = int(34)+1 = 35
if temporal_stride = 4
num_samples = int((50-16)/4) + 1 = int(34/4)+1 = int(8.5)+1 = 9

If you choose a temporal stride of 4 then the number of data samples that we can get is 9. The first sample will be from frame 1 to frame 16, the next will be from frame 5 to frame 21, the nest from frame 9 to 25, and so on.

For now, we will not consider temporal padding i.e if the number of frames in a video is less than the temporal_length we will drop that video from our data-set. For instance, if temporal_length was 16 and the number of frames in the video is less than 16 (say even 15), we will drop this video.

As discussed in part -2 of this tutorial series, the template which we will be following for creating a custom data generator is taken from this amazing blog — www.jessicayung.com/using-generators-in-python-to-train-machine-learning-models/

First, we need to create the samples which consist of the sequence of filenames along with the label of that sequence. For a temporal dimension of 16 our samples list should be in the format as

samples — [[[frame1_filename,frame2_filename,…frame16_filename],label1], [[frame1_filename,frame2_filename,…frame16_filename],label2],……….].

we will have two major functions —

  • a load_samples function which loads the file names in proper sequence from the CSV files and returns us the samples list
  • a data_generator function to read the sequence loaded in the samples and yield an array of data for training.

In the load_sample function, we call a generator function — file_generator(), that actually reads the file name and prepares it in the correct format. Let's have a look at the file_generator()

Now you can call the load samples to load the train or test data. Below is a snippet from jupyter notebook for loading the samples of train data with a temporal_length of 16 and a temporal stride of 4 .

Let’s see some of the samples from train data

As we can see here each of the samples has 16 frame sequences and a label corresponding to that sequence. Now how will the samples look with a temporal_length of 8

Now we have loaded our sample, we will feed this loaded sample into the data generator.

def data_generator(data,batch_size=10,shuffle=True):              
"""
Yields the next training batch.
data is an array [[[frame1_filename,frame2_filename,…frame16_filename],label1], [[frame1_filename,frame2_filename,…frame16_filename],label2],……….].
"""
num_samples = len(data)
if shuffle:
data = shuffle_data(data)
while True:
for offset in range(0, num_samples, batch_size):
print ('startring index: ', offset)
# Get the samples you'll use in this batch
batch_samples = data[offset:offset+batch_size]
# Initialise X_train and y_train arrays for this batch
X_train = []
y_train = []
# For each example
for batch_sample in batch_samples:
# Load image (X)
x = batch_sample[0]
# Read label (y)
y = batch_sample[1]
temp_data_list = []
for img in x:
try:
img = cv2.imread(img)
#apply any kind of preprocessing here
img = cv2.resize(img,(224,224))
temp_data_list.append(img)
except Exception as e:
print (e)
print ('error reading file: ',img)
# Add example to arrays
X_train.append(temp_data_list)
y_train.append(y)

# Make sure they're numpy arrays (as opposed to lists)
X_train = np.array(X_train)
#X_train = np.rollaxis(X_train,1,4)
y_train = np.array(y_train)
# create one hot encoding for training in keras
y_train = np_utils.to_categorical(y_train, 3)

# yield the next training batch
yield X_train, y_train
num_of_images=16
fig=plt.figure(figsize=(8,8))
plt.title("one sample with {} frames ; activity:{}".format(num_of_images,activity))
subplot_num = int(np.ceil(np.sqrt(num_of_images)))
for i in range(int(num_of_images)):
ax = fig.add_subplot(subplot_num, subplot_num, i+1)
#ax.imshow(output_image[0,:,:,i],interpolation='nearest' ) #to see the first filter
ax.imshow(x_0[i,:,:,::-1])
plt.xticks([])
plt.yticks([])
plt.tight_layout()
plt.show()

Training the Keras model with a prepared data generator

Now you can use the data generator to train a 3D CNN model. I will be defining a 3D CNN model in keras and training it with a data generator with a temporal length of 16 and an image size of 64.

def get_model(num_classes=Config.num_classes):
# Define model
model = Sequential()
model.add(Conv3D(32, kernel_size=(3, 3, 3), input_shape=(
16,64,64,3), padding='same'))
model.add(Activation('relu'))
model.add(Conv3D(32, kernel_size=(3, 3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same'))
model.add(Dropout(0.25))
model.add(Conv3D(64, kernel_size=(3, 3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv3D(64, kernel_size=(3, 3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same'))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation("softmax"))
model.compile(loss=categorical_crossentropy,
optimizer=Adam(), metrics=['accuracy'])
model.summary()
#plot_model(model, show_shapes=True,
# to_file='model.png')
return model
model = get_model()

The model starts training and I can see the improvement already!!

The link to the data generator for sequence inputs — https://github.com/anujshah1003/custom_data_generator/tree/master/activity_recognition

In this post, we wrote a data generator for the second scenario in which we trained a model which takes a sequence input.

This post is already getting long so we will cover the third scenario in the next post.

Till then, Keep Learning!! Keep Exploring Neurons!!!!

If you find my articles helpful and wish to support them — Buy me a Coffee

--

--