Using AI To Turn Your Face Into A Continuum Of Nightmares

Nhoral
8 min readFeb 15, 2019

--

This is what I imagine my first Kill-bot will see

In my previous articles, I’ve focused on building AI classifiers. That’s cool and all, but what if we want to use AI to generate something?

Disclaimer: This article mostly covers the output of my model and evaluation of the results. If you want to check out the code, there is a link at the bottom of the article.

Let’s build a model that can generate a fake. Not just any fake, but one so good that even an expert can’t tell the difference. We can simplify our problem by focusing on image generation (as opposed to audio or video).

What Is A Fake?

Now you might be thinking, “Copying a picture is easy, what’s impressive about an AI doing it?”

We aren’t going to make a copy. We are going to make something much more interesting — a Forgery. An image indistinguishable from a set of real ones but unique at the same time. We will learn how to change something in realistic ways.

Our problem is subtly different from duplication and to solve it we will use two models working in tandem.

This type of Neural Network ensemble is called a Generative Adversarial Network. It accomplishes the forgery by training two separate models. One gets good at making fakes (a Generator), the other gets good at spotting fakes (a Discriminator).

While they start out terrible at their job, this competition (and resulting improvement) slowly brings out the best in both. The cumulative effect produces forgeries that are indistinguishable from real images (in our case).

Crafting The Luxurious Beauty Of A Face

Step one, we need to generate a fake. Specifically, we want a model that takes in 200 random values and outputs a picture of a face. Why 200 random values? Why not 400? Or 4,000?

To answer that, let’s look at the images I am showing my Discriminator:

My training set had 42 images of various goofy expressions

To make my model easy to train, I’ve limited the variance between training images. The lighting, framing, and colors are fairly uniform. The primary variance is my expression. I am guessing the limited image variance can fit in 200 dimensions.

If this turns out not to be true, that my model simply can’t fit a continuous series of fakes in 200 dimensions, I can always adjust that number and try again.

Somewhere in those 200 dimensions is an Aphex Twin album cover

We Can Also Cheat A Little

Along with reducing variance, I opted to train on low-resolution (64 x 64) images. The lower resolution allows our model to train faster, but the problem remains conceptually the same as a high-resolution image. That said, by lowering the resolution we are removing information the Discriminator would use to better judge a fake. This means fooling it with low-resolution images will be easier.

Creating Our Knock-Off Factory

I opted to use the Wassertein GAN (WGAN) provided by Fast.ai. It seemed easy to implement and train, which is just my kind of API. Unfortunately, this architecture trains really slowly (which I realized too late).

To create our Generator, we can build a model based on the Fast.ai basic_generator. The only difference is replacing ReLU layers with Leaky ReLU layers (since it seems to be superior in our use case).

You can check out the source here

This Generator is a series of convolutional layers that start in a high-dimensional feature space and halve features at each layer, eventually resolving to a face or potentially a terrifying portal to our own destruction.

Creating Our Number One Fan

The other partner in this dance is the Discriminator. A model that will learn to understand all the nuances of our face (or mine in this example) within the parameters of our training images. Our Discriminator’s only motivation in life is to spot forgeries.

How does it know something is a forgery? We’ll mix real and fake images together and see how confidently it spots the fakes.

I am sparing you a lot of boring code you can read here

Like our Generator, I am again stealing a Fast.ai model. This time I am changing the basic_discriminator, making the same Leaky ReLU adjustment.

Before The Fake Comes Madness

Our input is random noise, so our initial images are also random noise in three channels (RGB). It is day one for our Discriminator as the CEO of our headshot fan club, so it isn’t good at telling a face apart from random colors.

As the features of each channel are resolved, non-uniform patterns start to appear across colors

Over time, the Discriminator better learns the features that distinguish a real image from my (currently) terrible fakes. As a result, the Generator starts to recreate complimentary patterns across channels.

A creepy face emerges

Exploring The Infinite

To improve the fakes, our Generator model uses the Discriminator’s judgement as its loss function. Because the Discriminator is also improving, the loss is a moving target. Evaluating loss for a moment-in-time doesn’t truly give us a sense of how well we are doing.

My face is in here somewhere

One way we can evaluate the Generator’s performance is by exploring the 200 dimensions we are inputting to create a face.

It’s impossible to try and evaluate all possible outputs of our model. However, we can explore the 200 dimensions by iterating over them in sequence and evaluating their impact on a fixed set of values.

Turning Knobs In Order

Our normal noise is a distribution between 1 and -1 with a standard deviation of 1.

If we pass large values into our dimensions, we can evaluate the disproportionally large impact relative to a static distribution.

Basically (and maybe I should have spared you the previous paragraphs), we turn up each one of our 200 values and see how well it can maintain a face.

Evaluating Extremes

As our Generator model gets better at making faces, it starts to become useful to evaluate our model’s uniform extremes. Instead of random noise, we can set our input values to an extreme and iterate over dimensions adjusting that value to the opposite extreme.

Not even my shirt makes it across all dimensions

This gave me the best sense of how well it was mapping those 200 dimensions into a contiguous area of realistic forgeries.

Face Explosions

It also gave me the chance to watch my face get torn apart, which was a bonus that I can now inflict on you.

I think they are made more gruesome by the low-resolution. It would be interesting to project these strange images back into a high-resolution space using the same images for training.

The Best We Can Get

I trained my model for a few days, which felt really inefficient. Starting with weights from a pre-trained auto-encoder seems like a better idea. Something to try on a future GAN.

Eventually, I got something that was pretty good at making fakes. Before I show you the best it could do, we can explore the input dimensions again and see how contiguous the face stays.

Idle Animation Generator

No More Dissolving Face

Setting our dimensions to values that we commonly saw in training, we can see large contiguous spaces of realistic faces. Additionally, visual features tended to group together along dimensions (eye and mouth adjustments near each other). Since the noise order is arbitrary, this proximity relationship is really interesting.

I’ll save exploring the uniform extremes for the end, but they skewed very easily. This gave me confidence that activations were bound to noise in a direct way, since large noise change caused large displacement.

The Space Between Faces

We don’t have a seamless transition between some expressions. This represents that our model can’t quite fit every possibility into a realistic face.

The face endures

It is interesting to see what does occur during one of these transitions, though. Some of the features still carry over into the new image, so we can see what features are the most difficult to map into a continuum.

Let’s See Some Fakes!

While I had hoped to have a truly contiguous face across 200 dimensions, I can back-pedal and transform my objective.

Can our Generator produce a fake that could reasonably fool someone?

Let’s take a look at some real images

I’ve come to prefer the melted version

And let’s take a look at some fakes

Not a ton of deviation from the average

Pretty good results, my model can reliably produce images that I can’t tell are fake at a glance. Closer inspection shows some contrast difference from real images, but I am confident it could fool someone.

Emotion Is Not For Robots

My training set had an over-representation of images where I wasn’t making an expression that deviated much from the average. The model had a hard time replicating these more rare expressions and preferred to just turn my head into a fine red mist.

However, it was still capable of producing decent variance in expressions.

It also generated my fraternal twin

What Now

I am pretty happy with the performance of the WGAN, but I am excited to use more modern generative architectures. The training time was my only complaint (and the madness of indecipherable loss vectors).

That said, the fact that it can produce great fakes from 42 images and a cheap GPU is a miracle. With a larger dataset and more computing power, I imagine you can create a fake from just about anything (certainly at this resolution).

I assume this is what demonic possession looks like

Looking At Weird Extremes

Our model is good at making faces, but it is still good at un-making them. Looking at high uniform noise, large values seemed to produce repeating textures. It reminded me a bit of dilation.

The images also seemed to tear along the same vertical line during the feature displacement. This might be showing some average of differences across the real set of images. I’m curious to look into it further.

Train More And Creep Yourself Out

There are examples of WGAN’s being used to create truly seamless face creators, so these breaks in continuity of faces is likely an issue with the extremely small data set or my lack of training patience.

With more images and time, you could train the model at a higher resolution very easily. Some tweaking of kernel size in your convolutional layers would probably be useful for higher-resolution images.

Try It Out!

This definitely shows that the WGAN architecture can deliver great results without a lot of work. Being able to reliably generate forgeries from image variance makes me wonder what will happen to the stock-photo industry?

If you made it this far and are still interested, check out the code in Github!

If any of this sounds interesting to you, I highly recommend taking Jeremy Howard’s amazing course — Practical Deep Learning for Coders.

--

--