Computer Vision for Busy Developers
Thresholds and Templates
This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.
So far in the series, we’ve taken a look at an entire image and manipulated the image in some way. We did this by looping through each pixel and manipulating them. While this produced interesting results, we did not get any closer to understanding anything about the image. Computer vision involves extracting information or data from images in an attempt to allow the computer to understand the contents of said images. For this next article, instead of blindly manipulating each pixel, we are going to try to determine if a particular pixel meets some criteria and if so, flag the pixel within the image. These will be our first steps in better understanding images.
Finding an Object via Color Thresholding
An easy way for us to try out some simple detection is to use simple criteria to determine if a pixel is, or is not, of interest. For instance, let’s assume we have a photo of the full moon (as shown below) and we want to determine where the moon is located within the photo. We will look through each pixel and compare it against a known pixel value that is yellowish in color. If the pixel color is similar enough to our yellowish color, then we will flag it as a pixel we care about. The process of finding areas within an image based on color is often referred to as Color (or Image) Thresholding.
To determine the similarity of the color of two pixels, we can calculate the “distance” of the color values of each pixel. As we discussed earlier, each pixel value is comprised of 3 individual color channel values (Red, Green and Blue). 3D points (vectors) are also made up of three values. One for each Axis: x, y and z. We can calculate the distance between two 3D points, therefore we can also calculate the distance between two pixels.
Math geeks out there are already thinking that we can calculate the distance between two 3D points using the Pythagorean Theorem as follows:
Where r, g, and b are the individual color channel values within a pixel and the resulting d value is the distance between the two pixels. When the distance is 0, the two pixels are the same. The larger the value of d, the greater the difference between the two pixels. Simply put, this method allows us to determine how similar two pixels are using some simple calculations.
SIDE NOTE: It’s worth noting here that the square root operation in the above calculation is both a slow to execute and unnecessary when doing most comparisons. As a performance optimization, we can perform a very similar calculation to determine the square distance (magnitude in some systems). To calculate the square distance, perform all of the same calculations and skip the square root operation. When comparing against the square distance result, you should also square the value you would like to compare against.
In our moon example, we compared the color of each pixel against a yellowish color
R:46,G:195,B:138] and when the distance was below a certain threshold, we replaced the pixel color with white
[R:255,G:255,B:255], otherwise, we would replace it with black
[R:0,G:0,B:0]. In other words, for each pixel we ask the question: Is this pixel’s color similar to our yellow threshold color? If so, then replace the pixel with a white pixel otherwise replace it with a black pixel. The result is this image:
At this point, we’ve turned our original color image into a simpler image where the pixels we care about are white and the rest are black — this is a Binary Image. By analyzing this binary image, we can infer a lot of information from the image that was not apparent in the original. We are able to infer where the moon is located within the image, it’s area, shape and how its size compared to the rest of the image. As an example, we’ve drawn a red bounding box around all of the white pixels in the image.
SIDE NOTE: The use of thresholding on a specific color has been used a long time in the television and movie production using a technique called Chroma key (commonly referred to as “green screen”). In this technique, footage is taken of actors in front of a green (or other solid color) background. During post-processing, the background is digitally removed by using thresholding and the background pixels are replaced with pixels from a different scene.
The trouble with this approach is that it only really works on ideal images where the object we care about is similar to one color and the rest of the image is very different. Consider this second image of the moon below.
When we attempt to run the same algorithm with the same threshold value on this other image of the moon, we can see that we’re picking up the clouds, even though we really don’t mean to. We can tweak things around by changing the reference pixel color or changing the similarity threshold, but this approach turns out to be pretty brittle when you are looking to cover various types of images in various different conditions.
Like a lot of other techniques and approaches we discuss in this series, Image Thresholding has its benefits and drawbacks. Which techniques you leverage in your projects are highly dependent on the use case. Furthermore, lots of these techniques can work differently when combined with other techniques or technologies. In my research, I’ve seen thresholding combined with or part of Image Segmentation, Edge Detection , Contour Generation and even Optical Character Recognition. Lastly, thresholding provides a lot of value when used with Infrared cameras where images are often produced with fewer colors and higher contrast. We will cover infrared cameras and other hardware later on in the series.
There are also some variations to Image Thresholding that goes beyond the scope of this article. We’ve been using “global thresholding” where we are using a constant color for all of the pixels. Variable or Adaptive Thresholding will calculate a local threshold value based on neighboring pixels.
SIDE NOTE: If you are unsure of what thresholding value to use in your own projects, check out Otsu’s Method which attempts to find a good thresholding value based on the distribution of intensities within an image.
Finding an Object via a Template Matching
While we got some interesting results, we can see that finding objects via a color thresholding won’t work as robustly as we would like it to. Comparing each individual pixel against a known color turned out to be a method that is just too rigid to work on a variety of images. Let’s consider for a moment that when we compare an image against a single pixel, we are really comparing the image against a tiny “Template Image” that is 1x1 pixels large. If comparing against 1x1 template image is too rigid, what would happen if we were to compare against a larger template image with more pixel information? Let’s see what happens when we use a larger Template Image containing many more pixels and compare it against the target image. A Template Image is an image that we use to compare against the target image.
In this next example, we’re going to use the following template image (taken and cropped from the moon image used previously). You will notice that the color and details of the image are not exactly the same between the template and the target image we are working with.
Our algorithm will look at each pixel from the template image and compare it against a corresponding block of pixels within the target image of the same size. If our template is 220x220 pixels, then the corresponding block of pixels within our target image must also be 220x220 pixels. To do the comparison, we will use the same distance calculation we used previously (Pythagorean Theorem). Once we compare all of the pixels within a block , we average all of the results together to determine the average difference between the template image and the corresponding block. This gives us one number to work with representing how similar the template image is to the current block of pixels. We can then compare this similarity value to a predetermined threshold or against all of the other comparisons in order to find the section within the image which is most similar to the template.
In this particular example, we start at the bottom left corner of the image and move to the right and then up. In the above example, we are visualizing the comparison block with a yellow rectangle. Once a section of the image meets our threshold, it will leave a green square indicating a possible match. This technique where we look at a block of pixels within an image, calculate a result and then move the section is referred to as the Sliding Window technique. We will revisit this technique later on when we discuss Feature Detection.
You can see above that the first match that passes our threshold does contain the moon. We could decide to stop the algorithm at this point. If we continue searching, we might be able to find another section which is a closer match to the template resulting in a better match.
The above illustrates that if we let the algorithm continue searching through the image, we can find a match which more closely resembles the template image.
While inefficient, the approach of using a template Image works really well for finding images of the moon, cars and lots of other objects as well. Template matching can be a great solution for many use cases. Yet, there are some very important aspects of this approach we should consider. The first consideration is the scale of the object represented in the template image in comparison to the target image. Imagine you are using one of those toddler toys where you are supposed to put the round peg into the round hole. Pretty easy to do, right? Now imagine if the peg is double the size of the hole. You are not going to be able to fit a large peg into a smaller hole. Likewise, a larger template image is less likely to match with this approach.
Theoretically, compensating for scale isn’t too difficult. As part of the algorithm, we can resize the template image so that it is as large as the target image, perform the search, and if no suitable match is found, reduce the size and try again until a successful template match is found. This approach isn’t very efficient, but keep this in the back of your mind as we learn about feature descriptors later on.
The second challenge when using templates is the orientation of the object. The moon in the previous example is a full moon so the orientation is less significant. In this next example, we will be attempting to find the ace of hearts card. We know which way is “up” by looking at the heart at the center of the card.
While the card in our template is facing upwards and the card in our target image is rotated about 45 degrees, template matching will have a much harder time finding the card within the image. Template Matching is best used in situations where the orientation of the object is predictable.
We have a couple of important takeaways from exploring image templates. The first takeaway is that while image template has its limits, it’s a very simple method and can work really well if it fits within the constraints of your particular use-case. Another important takeaway is that scale and orientation has a big impact on our ability to extrapolate information from images. We will explore some of the ways we can compensate for scale and orientation invariance when we discuss feature descriptors.
Working in Grayscale
Till this point, we’ve been working with color images which, as mentioned, is made up of pixels and each pixel is made up of three color channels. As a result, each pixel takes as much as 24-bits for each pixel. Grayscale images, on the other hand, can be represented in as little as 8-bits per pixel. The impact on memory and performance of processing grayscale images when compared to color images is significant. For many traditional computer vision applications, the extra information provided by the data in the three color channels don’t make the detection any more robust.
Beyond the performance implications, there is another huge benefit to using grayscale images. Grayscale images can reduce the impact of changes in color due to minor variations in lighting conditions. Environmental Lighting can have a big impact on how colors are represented as pixels through the sensor of a camera. While converting to grayscale doesn’t solve for all environmental lighting conditions, grayscale does allow us to reduce the impact of variations such as color temperature.
For these reasons, many computer vision algorithms will convert color pixels into grayscale pixels in order to speed up processing, reduce memory usage and to improve reliability against minor environmental lighting changes. Using grayscale isn’t a silver bullet, though. Like most everything else we’ve discussed so far, it’s largely dependent on the use-case. For instance, when building a fruit-picker robot using the color to determine the ripeness of the fruit is important. Later, we will also see that techniques using Deep Learning can leverage all the color channels as part of a Neural Network.
Through the remainder of the series, we’ll be working with grayscale images unless otherwise noted. For more information on how to convert color images to grayscale, please refer to the Image Processing article.
One of the simplest methods of extrapolating information from digital images is to separate out parts of an image by comparing each pixel value in the digital image to a predetermined “template” pixel value. This method known as Color or Image Thresholding. Color Thresholding is the method which makes the “Green Screen” effect used in movies and television possible. A more sophisticated approach compares portions of an image to the likeness of a Template Image using a sliding window search. While Template Matching works robustly in lots of cases, it’s negatively impacted by variations of both scale and orientation. In order to simplify algorithms, reduce impact on memory and performance, lots of Computer Vision algorithms convert color images into gray-scale.
Sources and More Info
- Thresholding — Lecture Video by Rich Radke
- Linear Filtering: Templates, Edges — Lecture by Aaron Bobick
- Enhanced Skin Tone Detection using Heuristic Thresholding — Academic Paper by Maheswari S
- Direction and Distance — Unity Documentation
- Colour-Based Object Detection and Tracking for Autonomous Quadrotor UAV — Academic Article by Hani Hunud A Kadouf and Yasir Mohd Mustafah
- Advance Color Machine Vision and Application — Lecture by Dr. Romik Chatterjee
- Chroma Key — Wikipedia
- Color Thresholding Method for Image Segmentation of Natural Images — Academic Paper by Nilima Kulkarni
- Object Detection via color-based image segmentation using python — Article by Salma Ghoneim
- Template Matching — Lesson at Queensland University of Technology: Robot Academy