Gotta Autoencode ’em All

Tales Lima Fonseca
BuzzRobot
Published in
5 min readJan 4, 2018

Today, let’s talk about Autoencoder. To do that, I will use an analogy with Pokémon, OK? This will be fun. Imagine that you are Ash Ketchum (a.k.a Satoshi) and a big Snorlax appears in the middle of your path, preventing you from following your journey to be a Pokémon master.

Snorlax — Author: Lazarus

In this case, let’s assume that you just need to throw the Poké Ball in the Snorlax to capture it and continue your journey. We can say that the Poké Ball is a Perfect Autoencoder.

Snorlax is a big Pokémon, and we need to put him inside a small Poké Ball. How we can do that? We have to “encode” him from his real form to a small representation that can be fit inside de Poké Ball. After that, if we want to let him out, we just need to “decode” the small representation to his real form again.

Encoding and Decoding Snorlax.

Autoencoder is just an artificial neural network that tries to reconstruct its input data. The idea of it is just that easy. The differential of the autoencoder is that it has a hidden layer that serves as a compressed representation of the input data. These kinds of models are used typically for the purpose of dimensionality reduction (compress data).

Autoencoder

Previously, I said that Poké Ball is a Perfect Autoencoder and the reason for this is that the Pokémon that enters in the Poké Ball can come back exactly as it was before. So, a Perfect Autoencoder can produce an output that is exactly the same as the input. Unfortunately, in reality, the Autoencoders can rarely achieve this perfect result.

Well, now that you already knows what is an Autoencoder, let’s see this in practice with Pokémon.

Creating The “Poké Ball” Autoencoder

Well, first of all, I did a data scraping on a website about Pokémon and extracted all the images of the 807 Pokémon (Generation I until Generation VII). The image below shows the first 9 Pokémon of the collected data.

Each image has a resolution of 256 pixels x 256 pixels and are described by the RGBA values. With the alpha value, we can guarantee that the background is not part of the Pokémon itself. So, each image has 256*256*4 = 262144 values.

Since we are working with images, nothing better than using a Convolutional Neural Network. If you want to get an idea of how this network works, check out my article: What’s happening inside the Convolutional Neural Network? The answer is Convolution.

CNN architecture of the Pokémon Autoencoder.

As we can see in the image above, our compressed representation of the input has 16*16*128 = 32768 values. This mean that we compress the data in 87.5% of his real size. This compression is really amazing, but can we reconstruct the image 100%? In this case, we had a certain loss of information in the compression and because of that it was not possible to reconstruct the image 100%.

So, I ran the code with 100 epochs and to verify the performance of the autoencoder during the training phase, every epoch I randomly selected a Pokémon and verified the quality of its reconstruction.

Below you can see, from the left to the right, from the top to the bottom, the result of the first 12 epochs. The reconstruction is not so good yet, but we can see some shape in it.

The First 12 Reconstructions.

Below you can see, from the left to the right, from the top to the bottom, the result of the last 12 epochs. After some time, in the end of the training phase, we can see that the reconstruction is much better than before.

The Last 12 Reconstructions.

That’s it… we just create your own Poké Ball. The results are amazing, right? The output is pretty close to the input. Of course, this Poké Ball can not be used in the Pokémon world because it is not a Perfect Autoencoder. Probably the Pokémon would come back with some problems (or even dead).

If you want to check the code of this project, just go to my GitHub account.

Cheers :D

--

--

Tales Lima Fonseca
BuzzRobot

Data Scientist — Systems Analyst — Ph.D Student