Creating Pokémon with Deep Learning

We know that many Deep Learning techniques are used in many different areas of our daily routine, such as: music recomendation, traffic light network control, advertisement distribution, beyond many other applications.

In 2015, deep learning researcher Ian Goodfellow published a paper introducing the concept of Generative Adversarial Networks (GANs). Briefly explaining, GANs are made from two Machine Learning models:

  • A generator model (G), able to learn some characteristics of a dataset, thus able to "create" new data "similar" to the originals;
  • A discriminator model (D), able to differentiate real data from synthesized data on the generator.

The training process consists on making the G model synthesize data as close to reality as possible, inducing the D model to believe it is indeed real.

Example of a GAN that generates handwritten characters.

A useful application for GANs is to generate high-resolution images from their low-resultion counterparts. To do so, the GAN must be trained with the big images (output), which are then reduced at the input.

Comparison of techniques used to increase image resolution: to the left, using the classic Nearest Neighbor method. To the right, using GANs.

In the last few years, GANs have been used to generate images from specific datasets, such as paintings:

Landscape images generated by the GANGogh model.

A curious application for GANs was introduced in the paper Towards the Automatic Anime Characters Creation with Generative Adversarial Networks. The authors collected 42 thousand images of female anime characters and used it to feed the GAN training.

Anime characters created by GANs.

After reading that paper, an idea came up: create new types of Pokémon from those that already exist, with the use of GANs.


PokéGAN

Using the AnimeGAN described above as a starting point, I developed a GAN able to generate new pokémon, fed by images taken from the official Pokémon website.

To the left, Pokémon miniatures (40x40 px). To the right, full-body images (425x425 pixels).

Before starting the training, the images were separated by categories, such as predominant color, pokémon type (fire, water, etc.) and some other characteristics. This allows the GAN to extract more valuable features from the training dataset. Example: flying-type pokémon usually have wings.

GANs appear to be very difficult to be tuned and trained in more complex architectures. Example: if the discriminator is more complex than the generator, probably it won't be fooled during the training. If this happens, the generator model won't have its weights updated, thus incapable of generating meaningful images.

With this in mind, I used basic architectures for both models:

Generator

The input is a minibatch of a tensor with [1, 1, 100] shape, filled with random noise, in the [-1, 1] interval. The output is a minibatch of a tensor shaped as [64, 64, 4]. There are 4 color channels here, including the Alpha channel (transparency). Thus, generated images will also have transparent background.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1, 1, 100) 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 4, 4, 512) 819712
_________________________________________________________________
batch_normalization (BatchNo (None, 4, 4, 512) 2048
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 4, 4, 512) 0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 8, 8, 256) 2097408
_________________________________________________________________
batch_normalization_1 (Batch (None, 8, 8, 256) 1024
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 8, 8, 256) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 16, 16, 128) 524416
_________________________________________________________________
batch_normalization_2 (Batch (None, 16, 16, 128) 512
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 128) 0
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 32, 32, 64) 131136
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
batch_normalization_4 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 64, 64, 4) 4100

Discriminator

The model input is a minibatch of [64, 64, 4] shaped images, and its result is a minibatch with only one value (0, if it concludes the image is a fake, or 1, if it concludes the image is real).

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 64, 64, 4) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 64) 4160
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 128) 131200
_________________________________________________________________
batch_normalization_5 (Batch (None, 16, 16, 128) 512
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, 16, 16, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 256) 524544
_________________________________________________________________
batch_normalization_6 (Batch (None, 8, 8, 256) 1024
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, 8, 8, 256) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 4, 4, 512) 2097664
_________________________________________________________________
batch_normalization_7 (Batch (None, 4, 4, 512) 2048
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, 4, 4, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 1) 8193
=================================================================

Results

The training times are short. With a Nvidia 1070 GPU, both models stabilized their weights in about 15 minutes.

The following images show some sets generated by the GAN.

Pokémon generated only from blue-colored pokémon: to the left, complete images. To the right, miniatures.
Pokémon generated only from Water-type pokémon.
Pokémon generated only from brown-colored pokémon.
Pokémon generated only from Flying-type pokémon.
Pokémon generated only from miniatures of yellow pokémon. Since there are few pokémon in this distribution (96), it is easy to notice that some of them are Pikachu. This happens because there are 14 different Pikacu representations (~15% of the distribution). They have different clothes and caps.
Gif showing the generation of blue pokémon, from random noise.

Although the internal portion of the images only consists of blobs of mixed colors, the contours and shapes are very convincing. It is possible to notice limbs that resemble wings in flying pokémon, flippers in water pokémon, paws, tails and horns in the remaining images. It is also possible to notice that the head stays on top of the body.

Another interesting subject is that all generated miniatures have dark contours, such as the original miniatures.


Stylizing the results

Using a bit of my imagination, a graphics tablet and my lack of talent drawing, I ended up transforming some of the blobs spitted from the GAN into something more close to real Pokémon.

Pokémon generated from blue pokémon. I bet its type is Water/Poison.
Pokémon generated from blue pokémon. Is its typing Rock/Steel? Or Psychic/Rock? Hard to come into a conclusion.

Ok, I admit those aren't the prettiest pokémon ever made. But hey, they aren't that distant from reality.


But in what do GANs impact our "real world"?

  • In october 2018, Christie’s (very old british art company) brought a GAN-generated painting to auction. The art piece was sold for US$ 432,500.
Painting “Edmond de Belamy, de La Famille de Belamy”, generated by a GAN. Note that the signature of the “author” is the generic equation that describes a GAN.
  • Fashion and clothing companies may use GANs to generate images of their clients wearing the catalog clothes, to help them to pick the right ones.
Example of a GAN that generates images of people wearing specific clothes.
  • Game developers may use real terrain data to generate terrains for their games, using GANs. Doing so, the games that depend on procedural terrain generation may end up being more realistic (https://arxiv.org/pdf/1707.03383.pdf).
  • The pharmaceutical industry often has a hard time creating new organic molecules that have practical applications. With the use of GANs, it is possible to generate organic structures that resemble the existing molecules in nature, directing better the focus of studies in this area (https://arxiv.org/pdf/1708.08227.pdf).
  • DeepFakes: GANs can be used to put a person’s face into another body, in pictures and videos. This can have serious implications, such as generating false evidence against a person. In the future (not so distant), video proofs may not have any value in court.
    An example of DeepFake implementation using GAN can be accessed in https://github.com/shaoanlu/faceswap-GAN.

With this, we can see that the GANs will have “more useful” applications than the synthesis of Pokémon images. However, while the technique is still evolving, all experiments are welcome.


References

- Goodfellow, Ian et. al. Generative Adversarial Networks:
https://arxiv.org/abs/1406.2661
- Jin, Yanghua et. al. Towards the Automatic Anime Characters Creation with Generative Adversarial Networks:
https://arxiv.org/abs/1708.05509
- Keras GAN Amimeface implementation:
https://github.com/forcecore/Keras-GAN-Animeface-Character/


NOTE: the portuguese version of this article is avaliable at:
https://medium.com/infosimples/criando-pokemons-deep-learning-e9442188aa08