1. Preparing the binary mask
First of all we will create a binary mask of the hand in order to compute the hand contour. To keep it simple we will segment the images based on the hand skin color using the inrange operation, but of course one can come up with sophisticated approaches to build a more stable algorithm. Furthermore we will convert our frames, which are in BGR format by default as you read them from a file or capture in OpenCV, to HLS (Hue, Lightness, Saturation) color space. The Hue channel encodes the actual color information. By this we only have to figure out the proper Hue value range of the skin and then adjust the values for Saturation and Lightness.
For my example I figured out a Hue range of 0° to 30° and a Saturation range of roughly 5% to 60% using a simple color picker. In OpenCV the Hue channel ranges from 0 to 180 instead of 0° to 360°, thus I am filtering the image with a Hue range from 0 to 15 as we have to divide by 2. To remove noise such as single pixels or small gaps we will refine the hand mask a bit by first smoothing it with a blurring operator and threshold it afterwards to obtain a binary mask again. We should end up with the following mask:
2. Computing the contour and it’s convex hull
Next we will tell OpenCV to find all contours in the mask. We will return the largest contour in case segmentation did not work out as well and still contains noise.
Now that we have detected the contour we can start to discuss the actual algorithm for detecting the fingertips and the number of fingers shown. To achieve this we will compute the convex hull as well as the convexity defect regions of the hand contour. Instead of trying to come up with some technical explanation of those terms I will just show you what that means in practice.
If we simply compute the convex hull of the contour above we will end up with the following result. As you can see the hull is a polygon spanned by the hand contour. The red circles indicate the edge points of the hull.
This is already close to what we need as the edge points of the hull are mostly located at the fingertips. For the next step however we want to make sure to have a single point per fingertip only. We will simply assign each point within a local neighborhood to a cluster and then pick the most central point of each cluster:
For clustering we use cv.partition (line 17), which we will feed the points of the contour that belong to the convex hull (the red circles) and also provide a function callback (ptsBelongToSameCluster), which compares two points of the input set and decides whether they belong to the same cluster. If the distance between two points is below a certain threshold “maxDist”, we will assign them to the same cluster. The resulting polygon should look much cleaner:
3. Detecting the fingertips
Now we let OpenCV compute the convexity defects of the hand contour to our new polygon. This will give us the defect regions of our contour which are described by a starting point p0, an ending point p1 and the defect point p2. As you can see the defect points (green circles) are located in the “valley ” position between two points of the convex hull of the contour.
When I first started implementing this, I went with the approach that seems to be most common, which is to figure out whether the angle alpha (center image) of the gaps between two fingers is sufficiently small to consider the fingertip to be shown. If you do that however, you will have to come up with a workaround for the case that only a single finger is being raised, as it will not be detected with this approach. For that reason I decided to come up with a different solution and consider the angle beta (right image) instead, which works as well. Therefore we will transform the data as follows:
The defect regions are returned as an array of vectors. The entries of each vector correspond to the index of a point in the hand contour. Entry 0 holds the starting point, entry 1 the ending point and entry 2 the defect point of the defect region. We will assign each hull point it’s two neighboring defect points and discard those points that do not have two neighors as they should not be points located at a fingertip anyways.
Once we have got the hull points with their defect neighbors, we can simply compute the angle beta spanned by the two vectors pt -> d1 and pt ->d2 of each vertex by applying the law of cosines. Based on the sharpness of the angle we can make a decision whether a finger is raised or not. In the example I found 60° to be a good decision boundary:
And there we have our results:
This is how easy it is to detect simple hand gestures. I have seen people drawing with their fingers or creating mini games that would let you control the character with gestures using their webcam but you can use the detection result for whatever you want your application to do. This example is not intended to solve hand gesture recognition perfectly but rather to give a hint to how to approach this task.