Week 3 & Week 4 (GSoC 2019)

Abhinav Patel
Jun 23, 2019 · 3 min read

The first four weeks of GSoC have passed and it has been a great learning experience overall along with the progress in the project. In this blog post, I will be updating about the progress I made in the project in the last two weeks.

During this period, the objective was to implement the Image Acquisition Module and test it with ChaLearn Dataset videos. And it’s really satisfying to say that it is achieved successfully. The aim of this module is to use the coordinates from the previous module and locate a boundary around the person performing the hand gesture in the video. So, a python script is written which uses a video and its json files containing the coordinates to return a bounding box around the person. Its code can be found here.

In this way, we get only the person’s image instead of whole image from each frame of a video which helps in rejecting the unnecessary part of the image. And these are passed to the next module, the Classification Module.

To achieve this, first of all, extreme left and right x-coordinates are found out. Also, for y-coordinate only the topmost coordinate is stored; the bottom boundary of bounding box is set to be the image boundary itself. The following code snippet shows this.

Here, x_coord and y_coord contain x-coordinates and y-coordinates respectively

These extremums are calculated by considering both body-joints and both the hand joints coordinates and for all the video frames. So, we have got the coordinate of extreme left, extreme right and topmost joint which must have appeared in one of the video frames. It is shown below.

Here, global_min_x, global_max_x, global_min_y are the extremums

After getting the extremums, a value offset is fixed which is subtracted and added, respectively, with global_min_x and global_max_x to get the final horizontal boundary of the bounding box. The top boundary is found by subtracting offset from global_min_y ; the bottom one is already fixed as the image boundary. This is done to ensure that no part of the person’s body is cut off and every useful information is collected. If the value of final extremums reach beyond the image boundary, it is set to be the image boundary itself for that dimension. Follow the following code snippet for this.

Finally, the bounding box is returned using the aforementioned extreme points. Thus, for every video frame bounding box around the person is fixed and it works perfectly as the extreme coordinates are used for defining the bounding box. Figure 1(a) shows sample input image and Figure 1(b) shows the bounding box around the person.

Figure 1(a): Input image
Figure 1 (b) Output image image with bounding box

Here, the left, top and bottom boundaries are not visible as they coincide with image boundary. Thus, bounding box around the person in every video can be found out and this cropped video will now be passed to the Classification Module.

Now, we are ready to implement the architecture described in the final pipeline for the Classification Module. So, in the coming weeks, the objective will be to complete its implementation and test it with the videos received from the Image Acquisition Module.

That’s all for this post. Cheers!


Abhinav Patel

Written by

Junior Undergraduate in Mathematics and Computing at Indian Institute of Technology (B.H.U.), Varanasi

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade