Deepfake AI: Explainer and Examples

Steven Vuong
Analytics Vidhya
Published in
4 min readMay 24, 2020

Article 2 of AI: Explainer and Example series

Deepfakes refer to AI generated audio or visual imitations of another person. Infamous cases include scamming a CEO of over $200,000, imitating famous politicians and creating alarming celebrity face swaps.

Declaration of Independence? Source: Albany University

So how does one begin to create fake images? One approach is to use a GAN, abbreviated for General Adversarial Network and usually consists of two convolutional neural networks (CNNs). If you would like a reminder of CNNs and what they are, feel free to check out the first article.

GANs were first introduced in 2014 by Dr. Ian Goodfellow and his team at the University of Montreal, later making Dr. Goodfellow famous within the AI community. Below you can see how GANs have rapidly developed within a four year time span.

Left: GAN generated faces (Ian Goodfellow et Al 2014). Right: GAN generated faces with mixed styles from original faces A and B (Nvidia 2018).

GANs consist of two primary components, the Generator and Discriminator which we will abbreviate to G & D. The role of G is to generate fake images from random inputs that will fool D, which has the job of determining which images are real, and which are fake outputs from G.

G & D are types of CNNs that are pit against each other in a zero sum game, hence the ‘adversarial’. Ultimately, the aim is to reach an equilibrium where D is always unsure whether the inputs are real or fake because G has become good enough at producing fake images, thus succeeding in fooling D.

Source: Thalles Silva

Starting with a randomised input, G applies upsampling layers which are simple layers with no weights that double the dimensions of input images and are required to produce regular image size outputs from a much smaller original input. Upsampling layers are followed by traditional convolution layers which learn to interpret the doubled input and create meaningful details. Overall, G is responsible for generating new plausible images from latent space.

Source: Sarvasv Kulpati

Fake outputs from G are fed to D along with real images. D’s network represents a more standard CNN with layers of decreasing size to output a probability score closer to 0 for fake and 1 for real images. So for D(input)=y_predicted, we want D(fake image)=0 and D(real image)=1.

To achieve this, D seeks to minimise the following between the actual and predicted values:

So for example:

Showing D is penalised for wrongly predicting values. The above equation is called binary-cross entropy loss function and its value is used to update prior layer parameters with back-propagation.

On the flipside, G wants to fool D by achieving D(fake image)=1 and so seeks to minimise:

Eventually, this has the effect of G learning to generate images with a low probability of being fake (and thus slipping by D).

This back and forth between the D and G occurs in training until an equilibrium is reached where D is unable to “spot the difference”. When this is done, we can take G out and use it to create some novel images outside of training. For instance:

Capturing more of Mona Lisa’s smile with Living Portraits, a GAN which has been fed videos of human faces in motion (including over 7000 celebrities) to recreate key movements that would apply to any face.

It’s always sunny with De-Rain, a photo editing GAN that can remove snow and rain from images.

Looking below, which images do you think are real, and which are fake? Answer at the very end

Lastly, you can also have a go yourself:

Thank you for sticking till the end, hope you learnt something! Also thanks to CT & VW for edit suggestions (they’re shy but I want to credit). Thinking of diving into back-propagation for the next one since it gets mentioned quite a bit orrr Recurrent Neural Networks.

Steven Vuong, Data Scientist

Open to comments, discussion & feedback:
stevenvuong96@gmail.com
https://www.linkedin.com/in/steven-vuong/
https://github.com/StevenVuong/

For the Explorers:

Answer: Tough guess? All of the images are all high resolution GAN generated images.

--

--