Finding Waldo — Feature Matching for OpenCV in Python
1. Introduction
In this article, we will do simple Feature Matching, to warm up before we start to do object detection via video analysis. We shall first do some detection with static images.
‘Where’s Wally’ is a popular British series of puzzle books that has garnered interest in both children and adults. Finding Waldo is never easy and OpenCV has a way that can allow us to find Waldo quickly :)
2. Concepts used for Template Matching
OpenCV has a function, cv2.MatchTemplate() that supports template matching to identify the target image.
Template Matching is the idea of sliding a target image(template) over a source image (input). The template is compared to the input. A match is determined by the how much the neighbourhood pixels in the input matches with the template.
There are various methods as to how the calculation of similarity is determined. For this example, we will be using TM.CCOEFF_NORMED
The template patch is slid over the input with this matrix and it determines a score that will indicate whether there is a match. TM_CCOEFF_NORMED finds the average value of the template and matches it to the average of the input. A score of 1 is a perfect match, -1 is a bad match and 0 is neutral.
3. Finding Waldo
Below is the image which we would be using, which will form our input!
Now, we will need a template image which of course is going to be a picture of waldo himself.
import cv2
import numpy as np
The 2 holy libraries needed for OpenCV
img_rgb = cv2.imread('find_waldo.jpg')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('waldo.png',0)
#saves the width and height of the template into 'w' and 'h'
w, h = template.shape[::-1]
cv2.imread reads the image while cv2.cvtColor converts the colour image to grayscale. This is how we normally perform any type of operation in OpenCV since we reduce the dimensionality and the complexity of the image.
res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.6
# finding the values where it exceeds the threshold
cv2.matchTemplate() is the function where we put in the input, the template and the method we use(explained above). The threshold is a value we use to determine the match, normally a value of 0.8 is chosen. I chose 0.6 because of how clustered and similar most of the people in the find_waldo image look.
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
#draw rectangle on places where it exceeds threshold
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,255,0), 2)cv2.imwrite('found_waldo.png',img_rgb)
For those who are not familiar, zip() is a function in python that merges 2 variables together into 1 tuple.
As such, what we are trying to do is to merge the width and height into one tuple known as ‘pt’. Hence, we add ‘w’ to pt[0] which is the width and ‘h’ to pt[1] which is the height.
cv2.Rectangle allows us to draw the rectangle by supplying the lower left and upper right coordinate.
We write this result into a PNG file known as found_waldo.png (:
And with that, we have found Waldo/Wally and have identified him by having a green rectangle drawn around him!
The full code is as below, really short and sweet.
import cv2
import numpy as np
from matplotlib import pyplot as pltimg_rgb = cv2.imread('find_waldo.jpg')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('waldo.png',0)
# saves the width and height of the template into 'w' and 'h'
w, h = template.shape[::-1]res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.6
# finding the values where it exceeds the threshold
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
#draw rectangle on places where it exceeds threshold
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,255,0), 2)cv2.imwrite('found_waldo.png',img_rgb)
4. Additional Comments
Template matching is fun to execute but does have its downsides as well.
Simple template matching is unable to operate on rotated images, flipped images or images in any other orientation.
We have to do multiscaling to consider such effects.
And while this has been fun, detecting objects in dynamic situations(such as video) would prove to be more useful. I am looking to share about complex template matching and even object detection in video very soon.
In the meantime, you can use some of your own images and toy with this simple matching first.
You can get the full code with photos in my github link:
https://github.com/k-choonkiat/TemplateMatching/tree/master
References
2. https://pdfs.semanticscholar.org/8ffd/b8a1f629a3bbbd1426206e09c7d9115b8f4b.pdf