Creating Pokémon with Deep Learning
We know that many Deep Learning techniques are used in many different areas of our daily routine, such as: music recomendation, traffic light network control, advertisement distribution, beyond many other applications.
In 2015, deep learning researcher Ian Goodfellow published a paper introducing the concept of Generative Adversarial Networks (GANs). Briefly explaining, GANs are made from two Machine Learning models:
- A generator model (G), able to learn some characteristics of a dataset, thus able to "create" new data "similar" to the originals;
- A discriminator model (D), able to differentiate real data from synthesized data on the generator.
The training process consists on making the G model synthesize data as close to reality as possible, inducing the D model to believe it is indeed real.
A useful application for GANs is to generate high-resolution images from their low-resultion counterparts. To do so, the GAN must be trained with the big images (output), which are then reduced at the input.
In the last few years, GANs have been used to generate images from specific datasets, such as paintings:
A curious application for GANs was introduced in the paper Towards the Automatic Anime Characters Creation with Generative Adversarial Networks. The authors collected 42 thousand images of female anime characters and used it to feed the GAN training.
After reading that paper, an idea came up: create new types of Pokémon from those that already exist, with the use of GANs.
PokéGAN
Using the AnimeGAN described above as a starting point, I developed a GAN able to generate new pokémon, fed by images taken from the official Pokémon website.
Before starting the training, the images were separated by categories, such as predominant color, pokémon type (fire, water, etc.) and some other characteristics. This allows the GAN to extract more valuable features from the training dataset. Example: flying-type pokémon usually have wings.
GANs appear to be very difficult to be tuned and trained in more complex architectures. Example: if the discriminator is more complex than the generator, probably it won't be fooled during the training. If this happens, the generator model won't have its weights updated, thus incapable of generating meaningful images.
With this in mind, I used basic architectures for both models:
Generator
The input is a minibatch of a tensor with [1, 1, 100] shape, filled with random noise, in the [-1, 1] interval. The output is a minibatch of a tensor shaped as [64, 64, 4]. There are 4 color channels here, including the Alpha channel (transparency). Thus, generated images will also have transparent background.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1, 1, 100) 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 4, 4, 512) 819712
_________________________________________________________________
batch_normalization (BatchNo (None, 4, 4, 512) 2048
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 4, 4, 512) 0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 8, 8, 256) 2097408
_________________________________________________________________
batch_normalization_1 (Batch (None, 8, 8, 256) 1024
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 8, 8, 256) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 16, 16, 128) 524416
_________________________________________________________________
batch_normalization_2 (Batch (None, 16, 16, 128) 512
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 128) 0
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 32, 32, 64) 131136
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
batch_normalization_4 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 64, 64, 4) 4100
Discriminator
The model input is a minibatch of [64, 64, 4] shaped images, and its result is a minibatch with only one value (0, if it concludes the image is a fake, or 1, if it concludes the image is real).
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 64, 64, 4) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 64) 4160
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 128) 131200
_________________________________________________________________
batch_normalization_5 (Batch (None, 16, 16, 128) 512
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, 16, 16, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 256) 524544
_________________________________________________________________
batch_normalization_6 (Batch (None, 8, 8, 256) 1024
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, 8, 8, 256) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 4, 4, 512) 2097664
_________________________________________________________________
batch_normalization_7 (Batch (None, 4, 4, 512) 2048
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, 4, 4, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 1) 8193
=================================================================
Results
The training times are short. With a Nvidia 1070 GPU, both models stabilized their weights in about 15 minutes.
The following images show some sets generated by the GAN.
Although the internal portion of the images only consists of blobs of mixed colors, the contours and shapes are very convincing. It is possible to notice limbs that resemble wings in flying pokémon, flippers in water pokémon, paws, tails and horns in the remaining images. It is also possible to notice that the head stays on top of the body.
Another interesting subject is that all generated miniatures have dark contours, such as the original miniatures.
Stylizing the results
Using a bit of my imagination, a graphics tablet and my lack of talent drawing, I ended up transforming some of the blobs spitted from the GAN into something more close to real Pokémon.
Ok, I admit those aren't the prettiest pokémon ever made. But hey, they aren't that distant from reality.
But in what do GANs impact our "real world"?
- In october 2018, Christie’s (very old british art company) brought a GAN-generated painting to auction. The art piece was sold for US$ 432,500.
- Fashion and clothing companies may use GANs to generate images of their clients wearing the catalog clothes, to help them to pick the right ones.
- Game developers may use real terrain data to generate terrains for their games, using GANs. Doing so, the games that depend on procedural terrain generation may end up being more realistic (https://arxiv.org/pdf/1707.03383.pdf).
- The pharmaceutical industry often has a hard time creating new organic molecules that have practical applications. With the use of GANs, it is possible to generate organic structures that resemble the existing molecules in nature, directing better the focus of studies in this area (https://arxiv.org/pdf/1708.08227.pdf).
- DeepFakes: GANs can be used to put a person’s face into another body, in pictures and videos. This can have serious implications, such as generating false evidence against a person. In the future (not so distant), video proofs may not have any value in court.
An example of DeepFake implementation using GAN can be accessed in https://github.com/shaoanlu/faceswap-GAN.
With this, we can see that the GANs will have “more useful” applications than the synthesis of Pokémon images. However, while the technique is still evolving, all experiments are welcome.
References
- Goodfellow, Ian et. al. Generative Adversarial Networks:
https://arxiv.org/abs/1406.2661- Jin, Yanghua et. al. Towards the Automatic Anime Characters Creation with Generative Adversarial Networks:
https://arxiv.org/abs/1708.05509- Keras GAN Amimeface implementation:
https://github.com/forcecore/Keras-GAN-Animeface-Character/
NOTE: the portuguese version of this article is avaliable at:
https://medium.com/infosimples/criando-pokemons-deep-learning-e9442188aa08