Thresholds and Templates

Vinny DaSilva
Jun 4 · 12 min read

This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.

So far in the series, we’ve taken a look at an entire image and manipulated the image in some way. We did this by looping through each pixel and manipulating them. While this produced interesting results, we did not get any closer to understanding anything about the image. Computer vision involves extracting information or data from images in an attempt to allow the computer to understand the contents of said images. For this next article, instead of blindly manipulating each pixel, we are going to try to determine if a particular pixel meets some criteria and if so, flag the pixel within the image. These will be our first steps in better understanding images.

Finding an Object via Color Thresholding

An easy way for us to try out some simple detection is to use simple criteria to determine if a pixel is, or is not, of interest. For instance, let’s assume we have a photo of the full moon (as shown below) and we want to determine where the moon is located within the photo. We will look through each pixel and compare it against a known pixel value that is yellowish in color. If the pixel color is similar enough to our yellowish color, then we will flag it as a pixel we care about. The process of finding areas within an image based on color is often referred to as Color (or Image) Thresholding.

We’ll be using this image of the moon as an example (Image Source: Samer Daboul)

To determine the similarity of the color of two pixels, we can calculate the “distance” of the color values of each pixel. As we discussed earlier, each pixel value is comprised of 3 individual color channel values (Red, Green and Blue). 3D points (vectors) are also made up of three values. One for each Axis: x, y and z. We can calculate the distance between two 3D points, therefore we can also calculate the distance between two pixels.

Math geeks out there are already thinking that we can calculate the distance between two 3D points using the Pythagorean Theorem as follows:

Where r, g, and b are the individual color channel values within a pixel and the resulting d value is the distance between the two pixels. When the distance is 0, the two pixels are the same. The larger the value of d, the greater the difference between the two pixels. Simply put, this method allows us to determine how similar two pixels are using some simple calculations.

SIDE NOTE: It’s worth noting here that the square root operation in the above calculation is both a slow to execute and unnecessary when doing most comparisons. As a performance optimization, we can perform a very similar calculation to determine the square distance (magnitude in some systems). To calculate the square distance, perform all of the same calculations and skip the square root operation. When comparing against the square distance result, you should also square the value you would like to compare against.

In our moon example, we compared the color of each pixel against a yellowish color R:46,G:195,B:138] and when the distance was below a certain threshold, we replaced the pixel color with white [R:255,G:255,B:255], otherwise, we would replace it with black [R:0,G:0,B:0]. In other words, for each pixel we ask the question: Is this pixel’s color similar to our yellow threshold color? If so, then replace the pixel with a white pixel otherwise replace it with a black pixel. The result is this image:

By Thresholding, we’ve simplified the original image to focus on the areas of interest — in this case, the moon.
(Original Image Source: Samer Daboul)

At this point, we’ve turned our original color image into a simpler image where the pixels we care about are white and the rest are black — this is a Binary Image. By analyzing this binary image, we can infer a lot of information from the image that was not apparent in the original. We are able to infer where the moon is located within the image, it’s area, shape and how its size compared to the rest of the image. As an example, we’ve drawn a red bounding box around all of the white pixels in the image.

Added a red bounding box to the binary image (Original Image Source: Samer Daboul)

SIDE NOTE: The use of thresholding on a specific color has been used a long time in the television and movie production using a technique called Chroma key (commonly referred to as “green screen”). In this technique, footage is taken of actors in front of a green (or other solid color) background. During post-processing, the background is digitally removed by using thresholding and the background pixels are replaced with pixels from a different scene.

Considerations

The trouble with this approach is that it only really works on ideal images where the object we care about is similar to one color and the rest of the image is very different. Consider this second image of the moon below.

(Image Source: Junior Peres)

When we attempt to run the same algorithm with the same threshold value on this other image of the moon, we can see that we’re picking up the clouds, even though we really don’t mean to. We can tweak things around by changing the reference pixel color or changing the similarity threshold, but this approach turns out to be pretty brittle when you are looking to cover various types of images in various different conditions.

Color Thresholding can sometimes pick up areas we don’t intend to (Original Image Source: Junior Peres)

Like a lot of other techniques and approaches we discuss in this series, Image Thresholding has its benefits and drawbacks. Which techniques you leverage in your projects are highly dependent on the use case. Furthermore, lots of these techniques can work differently when combined with other techniques or technologies. In my research, I’ve seen thresholding combined with or part of Image Segmentation, Edge Detection , Contour Generation and even Optical Character Recognition. Lastly, thresholding provides a lot of value when used with Infrared cameras where images are often produced with fewer colors and higher contrast. We will cover infrared cameras and other hardware later on in the series.

There are also some variations to Image Thresholding that goes beyond the scope of this article. We’ve been using “global thresholding” where we are using a constant color for all of the pixels. Variable or Adaptive Thresholding will calculate a local threshold value based on neighboring pixels.

SIDE NOTE: If you are unsure of what thresholding value to use in your own projects, check out Otsu’s Method which attempts to find a good thresholding value based on the distribution of intensities within an image.

Finding an Object via a Template Matching

While we got some interesting results, we can see that finding objects via a color thresholding won’t work as robustly as we would like it to. Comparing each individual pixel against a known color turned out to be a method that is just too rigid to work on a variety of images. Let’s consider for a moment that when we compare an image against a single pixel, we are really comparing the image against a tiny “Template Image” that is 1x1 pixels large. If comparing against 1x1 template image is too rigid, what would happen if we were to compare against a larger template image with more pixel information? Let’s see what happens when we use a larger Template Image containing many more pixels and compare it against the target image. A Template Image is an image that we use to compare against the target image.

In this next example, we’re going to use the following template image (taken and cropped from the moon image used previously). You will notice that the color and details of the image are not exactly the same between the template and the target image we are working with.

Template Image used in the following examples (Original Image Source: Samer Daboul)

Our algorithm will look at each pixel from the template image and compare it against a corresponding block of pixels within the target image of the same size. If our template is 220x220 pixels, then the corresponding block of pixels within our target image must also be 220x220 pixels. To do the comparison, we will use the same distance calculation we used previously (Pythagorean Theorem). Once we compare all of the pixels within a block , we average all of the results together to determine the average difference between the template image and the corresponding block. This gives us one number to work with representing how similar the template image is to the current block of pixels. We can then compare this similarity value to a predetermined threshold or against all of the other comparisons in order to find the section within the image which is most similar to the template.

Templating involves averaging all of the differences between the Template image and each section of pixels from the target image
Using a Sliding Window, we compare each block of pixels against the template image to find a match (Original Image Source: Junior Peres)

In this particular example, we start at the bottom left corner of the image and move to the right and then up. In the above example, we are visualizing the comparison block with a yellow rectangle. Once a section of the image meets our threshold, it will leave a green square indicating a possible match. This technique where we look at a block of pixels within an image, calculate a result and then move the section is referred to as the Sliding Window technique. We will revisit this technique later on when we discuss Feature Detection.

The first match we found isn’t a great match, we can keep searching to find a better match (Original Image Source: Junior Peres)

You can see above that the first match that passes our threshold does contain the moon. We could decide to stop the algorithm at this point. If we continue searching, we might be able to find another section which is a closer match to the template resulting in a better match.

The second match is a lot closer to the template image (Original Image Source: Junior Peres)

The above illustrates that if we let the algorithm continue searching through the image, we can find a match which more closely resembles the template image.

Using Template Matching, we were able to find a block of pixels in the target image that closely matches with the template image.(Original Image Source: Junior Peres)

Considerations

While inefficient, the approach of using a template Image works really well for finding images of the moon, cars and lots of other objects as well. Template matching can be a great solution for many use cases. Yet, there are some very important aspects of this approach we should consider. The first consideration is the scale of the object represented in the template image in comparison to the target image. Imagine you are using one of those toddler toys where you are supposed to put the round peg into the round hole. Pretty easy to do, right? Now imagine if the peg is double the size of the hole. You are not going to be able to fit a large peg into a smaller hole. Likewise, a larger template image is less likely to match with this approach.

Template Matching can fail to get a good match when there is a large variation in scale (Template Image Source: Samer Daboul, Target Image Source: Min An)

Theoretically, compensating for scale isn’t too difficult. As part of the algorithm, we can resize the template image so that it is as large as the target image, perform the search, and if no suitable match is found, reduce the size and try again until a successful template match is found. This approach isn’t very efficient, but keep this in the back of your mind as we learn about feature descriptors later on.

The second challenge when using templates is the orientation of the object. The moon in the previous example is a full moon so the orientation is less significant. In this next example, we will be attempting to find the ace of hearts card. We know which way is “up” by looking at the heart at the center of the card.

Template Matching can fail to get a good match when there is a large variation in orientation ( Target Image Source: Amirali Mirhashemian)

While the card in our template is facing upwards and the card in our target image is rotated about 45 degrees, template matching will have a much harder time finding the card within the image. Template Matching is best used in situations where the orientation of the object is predictable.

We have a couple of important takeaways from exploring image templates. The first takeaway is that while image template has its limits, it’s a very simple method and can work really well if it fits within the constraints of your particular use-case. Another important takeaway is that scale and orientation has a big impact on our ability to extrapolate information from images. We will explore some of the ways we can compensate for scale and orientation invariance when we discuss feature descriptors.

Working in Grayscale

Till this point, we’ve been working with color images which, as mentioned, is made up of pixels and each pixel is made up of three color channels. As a result, each pixel takes as much as 24-bits for each pixel. Grayscale images, on the other hand, can be represented in as little as 8-bits per pixel. The impact on memory and performance of processing grayscale images when compared to color images is significant. For many traditional computer vision applications, the extra information provided by the data in the three color channels don’t make the detection any more robust.

Beyond the performance implications, there is another huge benefit to using grayscale images. Grayscale images can reduce the impact of changes in color due to minor variations in lighting conditions. Environmental Lighting can have a big impact on how colors are represented as pixels through the sensor of a camera. While converting to grayscale doesn’t solve for all environmental lighting conditions, grayscale does allow us to reduce the impact of variations such as color temperature.

For these reasons, many computer vision algorithms will convert color pixels into grayscale pixels in order to speed up processing, reduce memory usage and to improve reliability against minor environmental lighting changes. Using grayscale isn’t a silver bullet, though. Like most everything else we’ve discussed so far, it’s largely dependent on the use-case. For instance, when building a fruit-picker robot using the color to determine the ripeness of the fruit is important. Later, we will also see that techniques using Deep Learning can leverage all the color channels as part of a Neural Network.

Through the remainder of the series, we’ll be working with grayscale images unless otherwise noted. For more information on how to convert color images to grayscale, please refer to the Image Processing article.

TLDR

One of the simplest methods of extrapolating information from digital images is to separate out parts of an image by comparing each pixel value in the digital image to a predetermined “template” pixel value. This method known as Color or Image Thresholding. Color Thresholding is the method which makes the “Green Screen” effect used in movies and television possible. A more sophisticated approach compares portions of an image to the likeness of a Template Image using a sliding window search. While Template Matching works robustly in lots of cases, it’s negatively impacted by variations of both scale and orientation. In order to simplify algorithms, reduce impact on memory and performance, lots of Computer Vision algorithms convert color images into gray-scale.

Sources and More Info

Vinny DaSilva

Written by

Technical Product Manager and Developer Relations specializing in AR & VR. Previously at Samsung NEXT, Vuforia

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade