A Decade of Image Recognition with Telia Logos
All machine learning algorithms are based on performing mathematical operations, which naturally can only operate on numbers. Often we want algorithms to learn something from images or texts, which to our human minds aren’t intrinsically numerical. In this notebook, we look at the Haar feature-based technique to create numerical features from a Scandinavian Telecoms logo (Telia). We then train an algorithm to detect the Telia logos in any image or video feed. This technique was the de facto method of detecting images from approximately to 2001–2011, and it is still widely used by companies such as Facebook. Recently it is now being used in tandem with convolutional neural networks, to build the next generation of image detection.
From pixels to numbers — Creating Haar-like Features
To begin creating haar-like features of a Telia logo, first we must take an image that has the Telia logo in it and then crop around that logo. Take for example the image below, taken from a picture of a Telia shop.
We then convert the image to grey scale, this greatly reduces the complexity of the problem as colour often does not add much information to the definition of items. We will also need to resize the photo, often 24x24 pixels will be a good choice.
Once we have done all the above we can then start to create our numerical features. Below is an image describing the basic process of creating haar-like features from a Telia logo. Note in step 5, we multiply the black area intensity by 3, this is so it is weighted the same as the white area.
We will now have 160,000 numbers that describe our original image, which we can start using with machine learning algorithms. Of course we will need more than one picture of a Telia logo to have any kind of accuracy. When creating features from other Telia logo images, it is a good idea to keep the aspect ratio of the cropped logo images the same. This means we will always end up with roughly the same number of features for each image.
Machine Learning — What it means to be a Telia logo
For image recognition to work we will need 1000’s of images of Telia logos and their subsequent Haar-like features (Our positive dataset), and many more 1000’s of images that have no Telia logos in them (Our negative dataset). We will need these to train an algorithm to recognise which features appear when there is a Telia logo and don’t appear when there is not.
Fortunately, we can create 100’s of Telia logo images just from one original Telia image by applying distortions and adding random backgrounds. We do however, need a database of many thousands of images to create our negative dataset, but there are many image databases online, which can help here.
AdaBoost and the Curse of dimensionality
Once we have our datasets of positive and negative images, we could train an algorithm on all 160,000 features. Except there would be so many features A) This would be really slow, B) >90% of the features are probably junk and C) There are more features than there are images, meaning our algorithm would have a very hard time generalising to find commonality between Telia logos.
This is why we use the AdaBoost machine learning algorithm. Extremely Briefly: In image recognition our AdaBoost algorithm will try to find one feature that is slightly better than random at predicting if the image has a Telia logo in it. It will then turn this feature into its own very weak prediction algorithm. AdaBoost will then look for another feature, creating another very weak prediction algorithm and add this to the first. It will continue to do this until it can correctly identify 99.9% of images with Telia Logos, but mislabel 50% of negative images as having Telia logos.
We stop at mislabelling negative images at 50%, as it turns out it only takes a few features to be certain that the majority of negative images don’t have a Telia logo. And we can run this first AdaBoost algorithm as an extremely quick test to rule out an image that doesn’t have a logo. We then take those images we can’t rule out as having a logo and apply a second stage of AdaBoost algorithms that is also set to 99.9% true positives and 50% false positives. And we keep adding stages till we can be certain there is a logo.
This process of adding stages of AdaBoost algorithms is called a cascading algorithm. It’s main strength is its speed at ruling out negative images and allowed for the first time real-time face detection.
OpenCV — Applying what we have learnt
Below is a list of commands used with opencv to build the Telis logo image recognition model, using the above theory.
Here are the cropped telia logos I used in this excercise, ideally the more the better.
Create a img/ folder and save all your images here that form the negative dataset. Then make a text file bg.txt that on each line lists the location of each negative image.
Example bg.txt file
img/2007_000027.jpg
img/2008_005376.jpg
img/2009_003407.jpg
img/2010_004942.jpg
img/2011_004709.jpg
img/2007_000032.jpg
The below command will take our cropped telia logo image and use our negative dataset to creat a random background. We also apply some randomised distortions, creating 128 new images saved to our sampleImageDirectory.
Repeat this command for each cropped telia logo image
opencv_createsamples -img Picture1.png -bg bg.txt -info sampleImageDirectory/Picture1.txt \
-bgcolor 0 -bgthresh 8 -num 128 -maxxangle 0.0 -maxyangle 0.0 -maxzangle 0.3
We combine the location of all our newley created images.
cat sampleImageDirectory/Picture*.txt > positives.txt
We then create a vector file from all our posative and negative images that we can feed into the next step, which trains the adaBoost algorithm.
opencv_createsamples -info positives.txt -bg ../bg.txt -vec test.vec
We then build our model using the following command. The result will be a cascade.xml file that will be our algorithm for predicting Telia Logos.
opencv_traincascade -data data -vec picture.vec -bg ../bg.txt -numPos 1000 -numNeg 600 -numStages 20 -mode ALL
Use the cascade.xml file in any application you want, read opencv documentation for more information, or see the short example below.
#Import the opencv library into python
import cv2
#Load out model into opencv
teliaCascade = cv2.CascadeClassifier("cascade.xml")
# Read an image we want to detect logos in and make it grey scale
image = cv2.imread("telia_shop.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
telias = teliaCascade.detectMultiScale(
gray,
scaleFactor=1.1, #We keep scaling the haar cascade model by 10%
#to look for bigger and bigger logos
minNeighbors=1, #A tuning parameter that says how accurate we want the model to be.
#Bigger is more accurate, but will detect less logos
minSize=(24, 24), #The smallest pixel width and height our cascade should be able to detect logos.
#Defined by what size we set during training
flags = cv2.cv.CV_HAAR_SCALE_IMAGE #Flag saying we want the appropriately scale the image
)
print "Found {0} telias!".format(len(telias))
# Draw a rectangle around the telia logos
for (x, y, w, h) in telias:
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)