Pokémon Recognition

I am sure you know Pokémon (stands for Pocket Monsters), they were very popular in nineties. Today we are going to solve Pokemon Recognition Problem: given the picture of a pokemon, find its name given a training set.

To break the ice, we apply several ML algorithms using Python and scikit-learn. Then we apply magical PCA to improve our previous results and also to create new mysterious and scary creatures called eigenpokemon!
Keep reading.

Collecting Data

First, we need to decide which pokemon we will try to identify.
I have chosen these ones:

Pikachu
  • Pikachu— because it’s the most known pokemon in the world
Tangela
Charizard
  • Charizard— because it’s a dragon, and dragons are cool
Gengar
  • Gengar — `cause he looks like he’s gonna whoop yo’ ass!

Then we need to collect some data. For each of the pokemon I gathered 20 images which makes total of 80 pictures in our dataset. Not so much, but it is hard to find many of them except for Pikachu, because he appears in every TV series unlike the others.

Preparing the data

Before we move on to solve the problem, we need to prepare the data.

If you look at any two images in our dataset you will notice that they have different size in pixels:

And this is a problem. Later each image will be represented as a point in some high-dimensional space. It is really bad when one pokemon lives in 97929D-space while another in just 21903D one.

So, let’s make all of the images 200x200. Of course, it means that we will lose some information, but that’s life.

We also convert them to grayscale to make our life easier.

from PIL import Image, ImageOps
...
img = Image.open(pokemon_path).convert('RGB')
img = ImageOps.fit(img, (200, 200), Image.ANTIALIAS, (0.5, 0.5))
img = ImageOps.grayscale(img)

Let’s see what we have now:

What’s next?

As I already said, we are viewing each image not as 2D array but as one large vector. So, for example, our Charizard looks like this:

Charizard (collapsed)

Then, we construct matrix X of these collapsed pokemon. The shape of the matrix is 80 x 40000, which is pretty squashed, huh?

So, we have 80 data points, 40000 features each, and we need to classify each of them to one of the classes: ‘Pikachu’, ‘Tangela’, ‘Charizard’ or ‘Gengar’. Why can’t we just put them in some good classification algorithm and see what happens? Of course, we can, but in general, having so many features would affect efficiency and, moreover, prediction accuracy.

Anyway, let’s suppose that we know nothing (like one famous character whose name is John) and fit SVM classifier to the training set, trying to find optimal parameters: penalty parameter C, kernel type and kernel coefficient gamma. We use K-Folds cross validation to measure the performance of the model.

pokemon = get_pokemon()
X = pokemon.data
y = pokemon.target
kf = KFold(len(y), n_folds=4, shuffle=True)
for train_index, test_index in kf:
X_train = np.array([X[i] for i in train_index])
X_test = np.array([X[i] for i in test_index])
y_train = np.array([y[i] for i in train_index])
y_test = np.array([y[i] for i in test_index])
    param_grid = {
'kernel': ['rbf', 'linear'],
'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
clf = GridSearchCV(SVC(class_weight='balanced'), param_grid)
clf = clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
print classification_report(y_test, y_pred, target_names=pokemon.target_names)
print confusion_matrix(y_test, y_pred, labels=range(n_classes))

It took 2 minutes to compute which is pretty long time for such a small number of pictures. What would happen if we got bigger images or larger dataset? Average precision and recall (calculated using confusion matrix for each pokemon and then averaged) is 0.95 and 0.9375, respectively.

We could also try something simpler than SVM in order to speed up the calculations. For example, kNN.

...
param_grid = {
'n_neighbors': list(xrange(1, 15)),
}
clf = GridSearchCV(KNeighborsClassifier(), param_grid)
clf = clf.fit(X_train, y_train)
...

It gave me 0.8725 and 0.825 average precision and recall and took just 10 seconds. Better time, worse accuracy than SVM.

Important note: we have so high SVM accuracy only because we have just 80 data points in such high-dimensional space. It would be much lower (especially with so many features) if we had more pictures.

Next, I’m going to show you how to reduce computation time while still having good classification accuracy. To do that we will apply the face recognition method developed by Sirovich and Kirby in 1986! The key idea is to apply dimension reduction algorithm first.

PCA

Data with so many features can easily contain a lot of noisy ones. Would it be better if we could select just ones which really capture the trends and the patterns in our data? Here’s where PCA comes into play!

PCA stands for principal component analysis.

The principal component is the direction in the dataset that has the largest variance, which is roughly the spread of a data distribution.

PCA finds several principal components, one capturing less variance than another and also orthogonal to each other. This forms new coordinate system on which we can map the original data points.

We care about finding axes lying in the direction of the largest variance because doing so saves us the maximum amount of ‘information’ in the original data.

Suppose that the pokemon images were just two dimensional and we could place them on 2D plane like on the image below. The biggest PC would be the red line, so we could project the images to it. This would reduce the dimension of the data and save us reasonable amount of information about it. If you projected on the green line, the data points would be very close to each other and further from the original positions.

In math, principal component are the eigenvectors of the covariance matrix. Every eigenvector has corresponding eigenvalue, which tells us how ‘important’ the eigenvector is.

So, hey! We can apply PCA algorithm to our pokemon points in order to find there several important principal components. Then we can project our images to those components, which will decrease dimension of the data points. This procedure should make model fitting faster.

Let’s do it step-by-step.

Here we apply PCA algorithm to the training set. We also specify that we want to select the number of components such that the amount of overall variance captured is greater than 80%.

n_components = .8
pca = PCA(n_components=n_components, whiten=True).fit(X_train)

Then we project all our data points on these components.

X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

We had 40000 features for each pokemon, but after doing PCA we have just 18!

As for eigenvectors (or principal components), they still live in 40000-D space which means that they can be rearranged back into 2D matrix and displayed. How cool is that?!
In the original article they are called eigenfaces, we name them eigenpokemon. Here is some of them:

Scary eigenpokemon

If you have a friend somewhere who has these eigenpokemon on his computer, you can send him not an image of the pokemon you like but just several numbers which can be used to reconstruct the original image. Of course, better reconstruction quality means you should add more principal components. Here’s the example:

Top row: Reconstructed Gengar (18 PC and 38 PC) Bottom row: Reconstructed Charizard (18 PC and 38 PC)

I again applied SVM to classify pictures but now after doing PCA:

param_grid = {
'kernel': ['rbf', 'linear'],
'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
clf = GridSearchCV(SVC(class_weight='balanced'), param_grid)
clf = clf.fit(X_train_pca, y_train)
y_pred = clf.predict(X_test_pca)

It greatly speed up the computations taking just 5 seconds to run! The average precision and recall are 0.935 and 0.925. It’s a little bit lower than we saw before but, as I said, if the dataset were bigger and better quality then we would also have better accuracy than SVM without PCA.

Code and dataset are available here: https://github.com/dimart/pokemon_recognition