Going Deep with Deepfakes

Published in

AI Club VIT Bhopal

7 min readMay 31, 2021

https://images.app.goo.gl/ySafokTTeGjnoAMe9

It’s too boring nowadays amidst the pandemic situation, isn’t it?

But thank god we have our smartphones with the most entertaining social media platforms where we can spend a lot of time of our day and keep ourselves engaged. The variety of content uploaded in the form of Photos, Videos and Short videos are a joy to watch and have become a leisure hobby for most people. People love to scroll through their feeds and explore various content being uploaded by people around the world regarding various topics.

Among all those contents, the most recent trend is the Deepfake videos being so famous nowadays. They are at bloom and every second video we see is a deep-fake video.

Well, don’t get confused by the name. By deepfake videos, I mean to refer to those videos in which you see a picture moving and doing actions. In these videos, a still image is seen to do certain actions which didn’t take place in real life but are simulated using various techniques for entertainment.

If we formally talk about what a deepfake is, a deepfake is an artificial video that contains facial movements generated from a real person. It was invented by David Vuser, a researcher at Austria’s Graz Technical University. For the most part, fake videos are created to parody people. However, the technology is currently being explored for video manipulation, as a tool to create more realistic

PhotoShop has been a roving wrecking ball, flattening any semblance of authenticity and carefully crafted truths. Now the effects of Photoshop have gone virtual — and we are talking about deepfakes.

On December 25, 2017, deepfakes appeared on Reddit as a user uploaded an altered video of Gal Gadot in Wonder Woman. The original actress wore a wig that was swapped with another actress’ hair in order to make it appear that Gadot was wearing her own hair in the film.

This is common practice in Hollywood as previous video editing software was not advanced enough to edit specific details of computer-generated images. CGI is a process where a 3-D model is used to simulate real-life situations, but the complexity of it has made it impossible to swap one actress with another.

Above, is an example of a deepfake video. So what exactly is happening in the image might be your question. Well, then it is the right place you can be for understanding more about deepfakes.

In the above image, there are two components, the input and the output. The input component is further classified into two sub-components, segregated as Source Sequence and Unmodified source Sequence. The source component is that image or video which is used as the base over which simulation is done on other images or videos. In simple terms, the source component is used to make a similar visual but with a different face or person. The different face or person whose deepfake is created is classified as the Unmodified Target Sequence.

Now combining various algorithms and techniques based on Artificial Intelligence, the result is obtained as it is discernible from the image above.

That’s a simpler explanation of what is going on. What will excite you more is the technology behind this process and how a deepfake video is produced.

The Methodology

Deepfake videos are created using Generative Adversarial Neural Networks and artificial intelligence systems, which predicts how an image will look after editing. This technology has enabled people to seamlessly edit videos without any trace and fool most viewers into believing in the fake footage.

While some deepfakes can be created by traditional visual effects or computer graphics approaches, the recently used mechanism for deepfake creation is deep learning models such as autoencoders and generative adversarial networks, which are applied primarily in the computer vision domain. These models are used to examine facial expressions and movements of a person and synthesize facial images of another person making analogous expressions and movements. Deepfake methods require a large amount of image and video data to train models to create photo-realistic images and videos. This is the sole reason public figures are being targets of deepfakes due to the havoc amount of images and video of them being available online which can be fed to the ML algorithm to generate realistic deepfake videos.

These applications are mostly developed based on deep learning techniques such as GAN. Now, what is GAN( sounds funny XD), but it is deeply involved in the creation of deepfakes.

What is GAN?

Generative Adversarial Networks(abbreviated as GAN), are an approach to generative modelling using deep learning methods, such as convolutional neural networks(CNN). Generative modelling is an unsupervised learning task(the model work on its own to discover patterns and information that was previously undetected) in machine learning that involves automatically discovering and learning the regularities or patterns in the input data in such a way that the method can generate new outputs based on any new data being provided.GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real (from the domain) or fake (generated).

Deep learning is well known for representing complex and high-dimensional data. One of the variants of the deep networks with that capability is deep autoencoders, which have been widely applied for dimensionality reduction and image compression. The first attempt for creating deepfake was the app named, FakeApp, developed by a Reddit user using an autoencoder-decoder pairing structure. The pictorial representation of the structure is shown below.

Encoder-Decoder Pair for Deepfake Creation

Encoder-Decoder pair to extract Laten Features

In this method, the autoencoder extracts dormant features of face images and the decoder is used to reconstruct and regenerate the face images. To swap faces between the source image and unmodified target images, there is a need for two encoder-decoder pairs in which each pair is used to train on an image set and parallelly the encoder’s parameters are shared between two network pairs. In simpler words, two pairs have the same encoder network. This technique enables the common encoder to find and learn the similarity between two sets of face images, which is relatively easy because faces normally have similar features such as eyes, nose, mouth positions which can be easily identified, duplicated and regenerated.

This encoder-decoder pair approach is used by many applications such as FaceLab, DFaker, DeepFake tf (TensorFlow based deepfakes).

Deepfake facial translation, as illustrated by Lui et al.

By adding adversarial loss(that is the difference in features between two sets of images) and perceptual loss(that is the error in receiving or identifying the features) implemented into the encoder-decoder architecture, an improved version of deepfakes based on the generative adversarial network (GAN) can be created. The perceptual loss is added to make eye movements to be more realistic and consistent and help to smoothen out the artefacts in the segmentation mask, leading to a higher quality of output images or videos. This model facilitates the creation of outputs with 64x64, 128x128, and 256x256 resolutions. In addition, the multi-task convolutional neural network (CNN) from the FaceNet implementation is introduced to make face detection more stable and face alignment more reliable. The CycleGAN is an adversarial network used to generate network implementation.

That’s a fairly detailed description of how deepfakes work. If you are interested in the Machine Learning algorithm and how it is implemented, then you can visit this article https://medium.com/gradientcrescent/deepfaking-nicolas-cage-into-the-mcu-using-autoencoders-an-implementation-in-keras-and-tensorflow-ab47792a042f for a detailed step by step implementation, which teaches you how to create Deepfakes using the FaceSwap-GAN repository by Lu et al.

The Bigger Picture

People have found all sorts of uses for deepfake videos. They can be used to interfere with elections by swaying public opinion or generating unprecedented fame and attention. Such instances, however, cannot be termed as art, and they get categorised as criminal offences as they are produced based on misleading information. Deepfakes also enable people to experience life through someone else’s eyes and have a more personal connection with another person than what they are normally able to achieve by sharing a real-life video chat session with friends or family members.

While every coin has two sides, deepfakes are no such exception. Besides being such a blooming technology with so many researchers already working on it, it has also being a source of criminal activities and are being misused by people. Deepfake being such an interesting field, it should be encouraged to improve upon it and use it for any better purpose or entertainment. This restores the integrity and holds the main motive of the continuous development of complex algorithms that can generate such beautiful things.

So it is completely up to us to scrutinize and examine the further possibilities and use them in the societal development that can increase the slope of the growth curve of technological improvement.

Stay in loop with the club:
Club Website : https://aiclubvitbhopal.github.io/
LinkedIn : https://www.linkedin.com/company/aiclub-vitb
Instagram: https://www.instagram.com/aiclub.vitb/?hl=en

Going Deep with Deepfakes

The Methodology

What is GAN?

The Bigger Picture

Written by Sakalyamitra