The first four weeks of GSoC have passed and it has been a great learning experience overall along with the progress in the project. In this blog post, I will be updating about the progress I made in the project in the last two weeks.
During this period, the objective was to implement the Image Acquisition Module and test it with ChaLearn Dataset videos. And it’s really satisfying to say that it is achieved successfully. The aim of this module is to use the coordinates from the previous module and locate a boundary around the person performing the hand gesture in the video. So, a python script is written which uses a video and its json files containing the coordinates to return a bounding box around the person. Its code can be found here.
In this way, we get only the person’s image instead of whole image from each frame of a video which helps in rejecting the unnecessary part of the image. And these are passed to the next module, the Classification Module.
To achieve this, first of all, extreme left and right
x-coordinates are found out. Also, for
y-coordinate only the topmost coordinate is stored; the bottom boundary of bounding box is set to be the image boundary itself. The following code snippet shows this.
These extremums are calculated by considering both body-joints and both the hand joints coordinates and for all the video frames. So, we have got the coordinate of extreme left, extreme right and topmost joint which must have appeared in one of the video frames. It is shown below.
After getting the extremums, a value
offset is fixed which is subtracted and added, respectively, with
global_max_x to get the final horizontal boundary of the bounding box. The top boundary is found by subtracting
global_min_y ; the bottom one is already fixed as the image boundary. This is done to ensure that no part of the person’s body is cut off and every useful information is collected. If the value of final extremums reach beyond the image boundary, it is set to be the image boundary itself for that dimension. Follow the following code snippet for this.
Finally, the bounding box is returned using the aforementioned extreme points. Thus, for every video frame bounding box around the person is fixed and it works perfectly as the extreme coordinates are used for defining the bounding box. Figure 1(a) shows sample input image and Figure 1(b) shows the bounding box around the person.
Here, the left, top and bottom boundaries are not visible as they coincide with image boundary. Thus, bounding box around the person in every video can be found out and this cropped video will now be passed to the Classification Module.
Now, we are ready to implement the architecture described in the final pipeline for the Classification Module. So, in the coming weeks, the objective will be to complete its implementation and test it with the videos received from the Image Acquisition Module.
That’s all for this post. Cheers!