Augmented Videos from the Weizmann Video dataset

Video Preprocessor and Augmentation for Deep Learning tasks

Published in

Analytics Vidhya

4 min readMar 24, 2021

With the growing demand for Video classification and recognition models for several video-processing tasks, it is important to understand how to process videos using python libraries. So in this article, we are going to see how we can process our raw video data and tune them accordingly for our specific needs.

1. Preprocessing Video data

For Preprocessing video data we are going to use python Libraries such as

OpenCV : https://opencv.org/
glob : https://docs.python.org/3/library/glob.html
imageio : https://pypi.org/project/imageio/
os : https://docs.python.org/3/library/os.html

please visit the above links for a better understanding and use of each library.

As you might be familiar that videos are nothing but stacked up images in form of frames so a video is having multiple numbers of frames combined to form a Video.

I will be using the Weizmann video dataset for our processing purpose.

The main folder of the dataset should have the following structure you can also edit names of subfolders for better redirection whiles inputting data from these subdirectories.

As you can see in the above structure I have renamed all subdirectories of the Weizmann dataset according to the class label for easier fetch in the future.

From the above code snippet, you can see that I am using os and glob to list and fetch all files respectively, I am also having X & labels as an empty list and appending respective data in them finally, I am returning both of lists as NumPy arrays. After that X will be having all video files as a frame array below I have shared the load_video code snippet for a better understanding of the flow.

As we can see I am using OpenCV to read all the frames from a particular video adding them in frames array, resizing them, and returning them as a NumPy array with pixel values normalized between 0 to 1.

To understand the flow I am also attaching the crop_center_square(frame) code below:

Now If you call loaddata(“path to the video directory”, number of classes) function you will get videos processed as a NumPy array and label of all the videos.

Xin, Yin = loaddata(“path to the video directory”, number of classes)

if you check the shape of Xin[index] where the index is any valid index having video as frames array you will get output as (x,224,224,3) where x is the numbers of frames in that video and 224 is the width and height of frames and 3 is the channel size red, green & blue.

Now to visualize any video from the processed NumPy array we can use to_gif function mentioned below.

we can see we are converting normalized pixel values back to 255 range formats to visualize properly.

if we call to_gif(sample_video) we will get output like:

2. Video data augmentation

To overcome the problem of limited quantity and limited diversity of data, we generate(manufacture) our own data with the existing data. Which we use for training any deep learning model with more diverse data of any particular category.

I majorly used vidaug for Video Augmentation related tasks.

vidaug: https://github.com/okankop/vidaug

Augmentation of a video with a probability to augment or not for a particular video is explored in the vidaug library I am attaching code-snippet for augmentation and how to make our train data larger than before for better results and to reduce those unhealthy overfitting in deep learning tasks.

Data Augmentation comprises of ways such as:

RandomCrop
RandomRotate
HorizontalFlip
VerticalFlip
GaussianBlur
etc.

Following is the code on how to use vidaug for video augmentation purposes:

The function which creates a NumPy array of Augmented video data is given below:

After the Augmentation of required data, we should be able to use the augmented data so I am attaching the code on how to concatenate with existing training data to make a new larger training dataset.

Below is the sample of Augmented data:

As we can see the above video is being Augmented using RandomRotate and Vertical Flip.

Thanks that’s all folks I will be coming with more Articles on Video Classification and recognition.

please reach out to me at:

Biplab Barman

View Biplab Barman's profile on LinkedIn, the world's largest professional community. Biplab has 3 jobs listed on their…

www.linkedin.com

please check :

Biplab097/i3d_finetuning

In this repository, we are using transfer learning and finetuning over new Two-Stream Inflated 3D ConvNet (I3D) which is…

github.com

For more queries.