Simple Hand Gesture Recognition using OpenCV and JavaScript.

OpenCV — Node.js Tutorial Series

Image for post
Image for post

In this tutorial I am going to show you how to recognize simple hand gestures e.g. detecting and counting fingertips using your webcam, in frames of a video stream or in still images using my npm package opencv4nodejs. The package provides you with JavaScript bindings to use OpenCV 3.x with Node.js. You can find the full source code of this and other examples as well as the source code of the package on my github repository. So let’s get started.

1. Preparing the binary mask

For my example I figured out a Hue range of 0° to 30° and a Saturation range of roughly 5% to 60% using a simple color picker. In OpenCV the Hue channel ranges from 0 to 180 instead of 0° to 360°, thus I am filtering the image with a Hue range from 0 to 15 as we have to divide by 2. To remove noise such as single pixels or small gaps we will refine the hand mask a bit by first smoothing it with a blurring operator and threshold it afterwards to obtain a binary mask again. We should end up with the following mask:

Image for post
Image for post
Image for post
Image for post
Frame (left), Binary Mask (right)

2. Computing the contour and it’s convex hull

Image for post
Image for post
Hand Contour

Now that we have detected the contour we can start to discuss the actual algorithm for detecting the fingertips and the number of fingers shown. To achieve this we will compute the convex hull as well as the convexity defect regions of the hand contour. Instead of trying to come up with some technical explanation of those terms I will just show you what that means in practice.

If we simply compute the convex hull of the contour above we will end up with the following result. As you can see the hull is a polygon spanned by the hand contour. The red circles indicate the edge points of the hull.

Image for post
Image for post
Convex Hull

This is already close to what we need as the edge points of the hull are mostly located at the fingertips. For the next step however we want to make sure to have a single point per fingertip only. We will simply assign each point within a local neighborhood to a cluster and then pick the most central point of each cluster:

For clustering we use cv.partition (line 17), which we will feed the points of the contour that belong to the convex hull (the red circles) and also provide a function callback (ptsBelongToSameCluster), which compares two points of the input set and decides whether they belong to the same cluster. If the distance between two points is below a certain threshold “maxDist”, we will assign them to the same cluster. The resulting polygon should look much cleaner:

Image for post
Image for post
Clean Hull

3. Detecting the fingertips

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Defect Points (left), Common Approach (center), My Approach (right)

When I first started implementing this, I went with the approach that seems to be most common, which is to figure out whether the angle alpha (center image) of the gaps between two fingers is sufficiently small to consider the fingertip to be shown. If you do that however, you will have to come up with a workaround for the case that only a single finger is being raised, as it will not be detected with this approach. For that reason I decided to come up with a different solution and consider the angle beta (right image) instead, which works as well. Therefore we will transform the data as follows:

The defect regions are returned as an array of vectors. The entries of each vector correspond to the index of a point in the hand contour. Entry 0 holds the starting point, entry 1 the ending point and entry 2 the defect point of the defect region. We will assign each hull point it’s two neighboring defect points and discard those points that do not have two neighors as they should not be points located at a fingertip anyways.

Once we have got the hull points with their defect neighbors, we can simply compute the angle beta spanned by the two vectors pt -> d1 and pt ->d2 of each vertex by applying the law of cosines. Based on the sharpness of the angle we can make a decision whether a finger is raised or not. In the example I found 60° to be a good decision boundary:

And there we have our results:

Image for post
Image for post

Conclusion

If you liked this article feel free to clap and comment. I would also highly appreciate supporting the opencv4nodejs project by leaving a star on github. Furthermore feel free to contribute or get in touch if you are interested :).

Opencv4nodejs is an npm package, which provides Node.js bindings to OpenCV and OpenCV-contrib through an asynchronous API. The package brings all the performance benefits of the native OpenCV library to your Node.js application and allows to easily implement multithreaded CV tasks via Promises.

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store