Creating new South Park Characters with Machine Learning
I’m a big fan of GANs. Ever since I heard about someone using AI to generate landscapes at a guest lecture, I wondered what other applications Generative Adversarial Networks (GANs) could have. Their most common use is to generate new images out of a training dataset, such as the famous MNIST handwritten database, or the celebA repository of celebrity faces.
What is a GAN?
Before I go onto explain how I approached this project, I want to take a moment to walk through what GANs are. A Generative Adversarial Network is a framework where 2 neural networks compete in a game of cat and mouse. One neural net (called the generative network) takes a training set and tries to generate new images, and the other (the discriminative network) tries to spot if the images are real or fake. They repeat this process multiple times, refining the model in the process.
How I compiled the dataset
I didn’t plan on doing a South Park generator when I started this project. My first inclination was towards Rick and Morty, because the show has such a cool art style, I thought it would be nice to see some new characters. Hey, maybe that would even help them get through all the delays they’ve been having!
Set out to create a new character, I was happy to discover a Rick and Morty API already existed, and they had portraits of all the characters. That should make my job a lot easier, I thought. However, the training set turned out to be far too heterogeneous. Next, I limited the set to only Ricks (for those who haven’t watched the show, there is one Rick per dimension, and they look slightly different, but have similar features, like the blue hair).
Even with just Ricks, the images were still too different from one another to produce anything recognizable as Rick and Morty. I deleted my email draft to Justin Roiland titled ‘I created new characters in 2 days, why can’t you write the season faster?’ and moved on. (I’m kidding. Or maybe not. Only gmail knows the truth.)
After doing some more research, I concluded that the problem wasn’t the type of GAN I chose (more about that in the next section) or the number of epochs I chose (how many times the model ran), but the training set itself. I needed a cartoon where the art style is a lot more consistent, and the characters look like each other. If only there a cartoon where the writers intentionally simplified the art style so they could produce episodes really quickly…
The problem, however, was figuring out how to compile this dataset. There was no API like for Rick and Morty, but I did manage to find a web site with all their pictures. I didn’t want to download them one by one, so I had to write a web scraper to do that for me. You can find a link to the web scraper I implemented on my github.
Cleaning the data
After all the images finished downloading, I still had a large variety of characters, including some pictures of real people (as opposed to animated characters).
To simplify the GAN’s work, I narrowed down the dataset once again to only include the children characters that were drawn in the classic, rounded out, South Park style.
Choosing the type of GAN and fine tuning the parameters
Since GAN is a framework, there are many types of GANs out there. I looked at what type of GAN was used most frequently for this type of image generation, and found that many sources recommend the Deep Convolutional GAN or DC GAN (more on the specifics here). I used this implementation in order to train my model.
Next came the pre-processing step, where I had to resize the images to be 64x64 pixels. While this wasn’t the best in terms of resolution of the resulting images, I needed to keep the training time low, in order to be able to iterate quickly. If I had to make adjustments because the batch was bad, I didn’t want to find out after hours of training.
After pre-processing the images, the GAN sampled 64 images from the dataset as the training data. You can see the images below. I still wanted to leave in images where the face was partially obstructed (like Kenny’s orange hoodie) because I wanted to see how the GAN would react. If the dataset was too uniform, it would have become too trivial of a problem to solve.
After that, the only parameter left to tune was the number of epochs, or how many cycles of training to go through.
Results
Watching the GAN train was pretty magical. I found the sweet spot to be around 1000 epochs. Anything less and the images were fuzzy. Anything more and certain features started to become exaggerated. A few snapshots of the thresholds are below. The training process took about 2 hours for the final version.
After 100 epochs, the model was producing some results, but nothing too clear. Also it seemed to generate just blue and orange colors for clothing.
After 500 epochs, the images started to become clearer, but there were some artifacts that started to emerge that weren’t quite up to par (e.g. in the bottom left corner).
Finally, after 999 epochs, some viable results started to be generated.
My favorite one has to be one of the characters in the last row. None of the characters in the training set had a red-orange shirt and a green hat, so he’s truly unique. I think I’m going to call him Pixelated Pete.
In the future, I’m thinking of expanding this to different datasets, and trying with a higher number of pixels to see if the results can improve. To stay up to date with my projects, follow me on Twitter.