Computer Vision for Busy Developers
This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.
What’s a Feature anyway?
In the last article, we discussed at length how to extract edges from Images. We used simple convolutions to find edges using the Sobel Operator. Edges are great step toward understanding images and there are a lot of situations where edges is all we need given a specific use-case. Edge detection does have some limitations that we need to be cognizant of. Consider the following vertical edge:
Would you be able to quickly or accurately determine where in the following image this edge is located?
We can tell that the edge belongs alongside of the Clock Tower, but we are going to have a hard time determining exactly where along the clock tower the edge belongs.
This phenomenon (a variation of the “Aperture Problem” discussed later) is directly related to the fact that edges don’t have enough unique information within them for us to be able to accurately determine where it belongs within an overall image.
This is a critical idea! There are lots of use-cases where we need to be able to extract information from the image which more uniquely identifies the contents of the image (image stitching, tracking objects in video, etc). This brings us to the next topic of Feature Detection.
Feature detection is a process where we attempt to find Features or Key Points which are “one or more measurements of some quantifiable property of an object, computed so that it quantifies some significant characteristic of the object” (Kenneth R Castleman, Digital Image Processing, 1996). Often times, these features are represented by Corners or Blobs. Corners are what we imagine them to be, the intersecting point of two edges. We will go deeper into Corner detectors shortly. Blobs are a more amorphous region of pixels that share something in common, such as intensity. Finding these interesting points within images and exploring the region around them and their relationship to each other is the foundation of most modern computer vision algorithms and it’s what allows us to be able to properly recognize objects, surfaces, faces, streets, etc.
Once we run a Feature Detection algorithm through an image, the results are generally represented as a list of x and y pairs indicating where the features are located within the image. These results can be visualized as a set of colored pixels overlaid on top of the original image.
The Harris Corner Detector
The Harris Corner detector is one of the simpler feature detectors that I found in my research. The Harris Corner Detector builds upon Edge detection techniques — in fact, lots of the Harris Detector implementations I’ve seen in my research uses Sobel internally.
What is a corner? We can describe it in a way that makes sense for humans, but what does it mean for a computer? Let’s take a look at the following patches of pixels in our example image.
Let’s focus on the pixels in window A first. Intuitively, we know this is not a corner because there is nothing in it. We call areas like this Flat — it’s pretty devoid of any useful information. There are no corners, features or even edges. We can confirm this by using a very simple test. Let’s shift window slightly in any direction and see what happens.
To test if we have a corner, we shift the window in any direction and compare it against the original window. Window A fails the test because all of its shifted windows contains similar visual data. In other words, if we were to shift the window in any direction, there would be minimal change in intensity.
Let’s take a look at window B and run our little corner test on it. We will shift window B slightly in any direction and compare.
This is where things start to get really interesting! Intuitively, we already know that we are looking at a horizontal edge. When we move our window along the X axis (left to right), we see very little change in image intensity — it’s nearly identical when we follow the edge (just like it did when we were looking at the vertical edge earlier). When we move the window along the Y axis (up and down), there is significant change!
Looking at Window C, along the Vertical Edge we checked out earlier, we can conclude that we will get inverted results. There is little change when we shift Window C along the Y axis (up and down), but there is significant change when we shift the window along the X axis(left to right). This can be summarized in a simple idea: when a window is shifted in any direction and there is only a significant change of intensity in only one axis, then the window is likely on an edge.
Let’s take a look at Window D and run our corner test on it.
Aha! We are seeing significant change when we shift Window D in any direction! We have found a corner! Take a closer look at what happens when we shift the window in any direction and compare.
This gives us enough information to define a corner in a way that a computer can understand it. A corner is defined by shifting a particular window in any direction and comparing. When there is significant change in intensity in both the X (left to right) and Y (up and down) directions, we have a corner. This is exactly what the Harris Corner Detector extracts from images.
To recap: when we compare any particular window against a corresponding window that is shifted in any direction, there are three possible outcomes. If there is no significant change in intensity in any direction, the window is on a flat area. When there is significant change in only one direction, the window is on an edge. When there is significant change in both directions, the window is on a corner.
When we take this idea and implement it in code, we are able to extract the corners from the image. I like to think of the results as a list of X and Y coordinates that match to where the points are located within the image. We can also visualize the corners as seen below.
This is a high-level view of the Harris Corner detector. Understanding the intuition behind how computers define and extract corners is an important step towards more advanced ideas in Computer Vision. The math behind the Harris Corner detector is pretty involved and has been implemented in major CV libraries. Developers looking to implement their own Harris Corner detector will want to go deeper into the source material (I personally found a lesson from Queensland University of Technology and an implementation tutorial by Marcel Sheeny particularly useful).
Steps to use Harris Corner Detector
Like a lot of other fields, advanced Computer Vision concepts are built on top of simpler Computer Vision concepts. It’s difficult to explore the Harris Corner Detector without first exploring edges. For example, a typical Corner Detector could take on the following steps:
- Resize image to a smaller resolution to reduce processing time
- Convert the image to grayscale to focus on the intensity and not on individual color channels
- Run a Blur Kernel on the image to reduce image noise
- Run the Horizontal Sobel Kernel
- Run the Vertical Sobel Kernel
- Run the Harris Corner Detector
- Threshold the results to focus on the most significant corners
As you can see, the process of finding features are dependent on other topics we covered earlier, such as the Sobel operator, Blurs, grayscale, threshold etc. You will find that as we explore more advanced topics, we will often look back at ideas we explored earlier.
Scale and the Aperture Problem
While we are discussing features, it’s relevant to bring up the “Aperture Problem” and the importance of scale. I found it very helpful to understand how these two ideas relate to each other. The aperture problem is concerned with motion. We saw this play out when we tested for corners earlier in this article. We shifted a window in any direction and determined how much change there was. (I highly recommend you check out this site with various activities describing the aperture problem through interactions.) Generally, the Aperture Problem describes a situation where the size of a window is too small to obtain useful information about changes in the image. The way I like to think about it is that there is an appropriate window size for every feature and image.
This leads us to the broader idea of scale. Specifically, the scale of our detector compared to the image as a whole. When there is a mismatch between the scale of the detector and the size of the features within the image, we can end up with too few points of data or too much or noisy data (which would lead to unnecessary processing times). The bigger problem, though, is that the features will change as the scale changes.
Here’s another way to think about scale. If we kept everything else the same, but we changed the size of the image, would we get the same features as a result?
The Harris Corner detector is scale (size) dependent therefore it is not scale invariant. The corners detected will be different depending on the scale of the image. This is another critical idea which we will continue exploring. That is, the results of the Harris detector is heavily influenced by the scale of the image in relation to the scale of the window. The big drawback to scale dependent algorithms is that unless you have some control over the scale of the images, it can be very difficult to work across multiple images (or frames of a video) or to detect objects that do not match an expected size because we cannot rely on our features being consistent as the scale changes.
As we continue, we will discuss scale invariant techniques in computer vision. At a high level, scale-invariant algorithms work by looking at changes not only along the x and y axes, but also along scale variations of the image.
Feature Points are areas of an image which more uniquely identifies the contents of the image. Feature Detectors are algorithms which detect feature points within an image. Corners and Blobs are two popular types of Feature Points often used in computer vision. The Harris Corner detector is a simple corner detector algorithm which looks for areas within the image that, when shifted slightly in any direction, shows great variation in intensity. Related to this idea is the Aperture Problem which describes a situation where there is not enough information available to detect motion. Lastly, we discussed the drawbacks to Scale Dependent algorithms such as the Harris Corner detector.
Sources and more Info
- Digital Image Processing — Book by Kenneth Castleman
- Introduction to Corner Features (Harris) — Lesson by QUT Robot Academy
- Implementing Harris Corner Detector in VisionCpp — Article by Marcel Sheeny
- Harris Corner Detection and Shi-Tomasi Corner Detection — Article by Nisha Gandhi
- The Aperture Problem
- Corners, Blobs and Descriptors — Lecture by Rob Fergus
- The aperture problem and motion integration — Article by Ruye Wang