Deepfake: Angel or Evil?

Krince-Lesley-Kevin
SFU Professional Computer Science
10 min readFeb 12, 2021

By Haokun Liu, Hanzhi Ding, Wen Han Tang

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

The technology for modifying photos and videos have been in use for many years. For example, a lot of people probably heard of or used Photoshop. In the film industry, we have seen so many movies with special effects, such as putting an alien spaceship above New York City. In recent years, with the flourishing of deep learning and artificial intelligence, this technology has ushered in another development and a new word, ‘Deepfake’ has come to our eyes.

What exactly is Deepfake? This unique term has two components: deep and fake, where deep is the shortened form of deep learning. It refers to synthesizing videos, audio, or photos using deep learning techniques where the voice or face (or other characteristics) of a person would be swapped to another person’s characteristics. Many might think that this is just another similar technology with traditional media modification. Still, by leveraging the power of neural networks and deep learning, this technique’s outputs could be highly deceivable and outperform those conventional methods.

The term ‘Deepfake’ was first brought up by a Reddit user in late 2017. Even though the name only came up in recent years, the technology idea has emerged since the last century. Its academic research history can be dated back to 1997 when a project called the Video Rewrite software was initiated that year, which can alter video footage using machine learning techniques. At first, the term ‘Deepfake’ only described particular videos where the person’s face was exchanged using AI techniques, but as things progressed, its scope was extended to include other related applications, such as synthetic audio, where the voice was exchanged using similar techniques. The technology behind Deepfake can be applied in the marketing business, fashion business, film industry and other fields. However, as the popularity of Deepfake increases, many abominable issues also occurred. These issues include but are not limited to fraud, misleading public opinion, blackmailing, pornography with faces of actresses or actors swapped to other people and other conceivable things. Individuals who value the positive benefits of this technology in various industries have started to use the term ‘Artificial Intelligence Generated Synthetic Media’ to avoid the negative connotation under the name ‘Deepfake’. And some also developed anti-Deepfake detection applications to help combating those negative actions.

By reading this blog post, we hope that people not only can learn about Deepfake and its technology, they also realize how serious these negative impacts are and how important the careful use of technology is.

Algorithm:

First, we will introduce a few fundamental points about the Deepfake algorithm.

The Face Swapping Pipeline

Here is a flow chart of face swapping from the article Exposing DeepFake Videos By Detecting Face Warping Artifacts.

Flow Chart of Face Swapping

Start with face detection (green box) and face landmarks (red points). This step determines which area is going to be swapped. Transform matrix normalizes warp face area. Then the Deepfake model will take the normalized face image as input and return a synthesized face image. Use the same transform matrix to refine the shape and apply post-processing, including boundary smoothing, to the composite image.

The kernel of Deepfake is an auto-encoder, which is an unsupervised deep neural network. The encoder takes data input, compresses it into a small code, and then the decoder regenerates the original input data from that code. The encoder and decoder have to be trained together but can be used separately. To swap two faces, it needs one encoder and two decoders. In each round, encode two warped faces first. Then, let decoder A identify and convert face A, and let decoder B do the same to face B. Repeat the above operations until the two decoders can convert the two faces and capture the key information of the faces. Finally, by applying decoder B on encoded face A, face A is swapped to B.

Generative Adversarial Networks improve the Deepfake algorithm. GAN is a deep neural network including a generative model and a discriminative model. When generating images, the generative model G produces the faked image, and the discriminative model D tries to detect the fake one. The algorithm is straightforward: G takes a noise variable z and generates an image G(z), then D will output the probability D(G(z)) and D(x) where x is a real photo set.

How GAN works

During the training, the generator G’s goal is to produce an indistinguishable image and to fool the discriminator — maximize D(G(z)). Meanwhile, the purpose of discriminator D is trying to separate the image generated by G from the training image set — maximize D(x) and minimize D(G(z)). In this way, G and D constitute a dynamic min max game process, making the final output very close to a real one.

Each decoder is a generator now, and it will let the discriminator distinguish the generated image in each turn.

How CycleGAN works

Application:

What is the future of Deepfake, and which area will be used for Deepfake technology? I would like to share some existing cases and let you imagine. Here’s a little sampler from Jordan Peele and BuzzFeed, using some of the latest AI techniques, to make Peele ventriloquizes Barack Obama. The sampler had Obama voiced his opinion on Black Panther (“Killmonger was right”) and called President Donald Trump “a total and complete dipshit.”

Peele’s production company used a combination of Adobe After Effects and the AI face-swapping tool FakeApp. FakeApp is the most prominent example of how AI can facilitate the creation of photorealistic fake videos.

Researchers have developed tools that let you perform face swaps like the one above in real-time. The system was designed by researchers from the University of Erlangen-Nuremberg, Max-Planck-Institute for Informatics, and Stanford University. First, the “target actor” (Bush, Trump, Putin, and Obama) is rendered with a neutral expression. Then, the source actor’s expressions (the other guy) are captured via webcam, and those expressions control the animation in the YouTube video.

Adobe is also creating a “Photoshop for audio” that lets you edit dialogue as easily as a photo. It is working on a new piece of software that would act like a Photoshop for audio, according to Adobe developer Zeyu Jin, who spoke at the Adobe MAX conference in San Diego, California, in 2016. Project VoCo is designed to be a state-of-the-art audio editing application. Beyond your standard speech editing and noise-cancellation features, Project VoCo can also apparently generate new words using a speaker’s recorded voice. Essentially, the software can understand the makeup of a person’s voice and replicate it.

Deepfake Generation Example:

Here’s a brief introduction to how to make a very simple fake video, such as replacing faces with anyone in a video. FaceSwap is a tool that uses deep learning to recognize and exchange faces in images and videos. To make a fake video, you will have to:

● Gather photos and/or videos

● Extract faces from your raw photos

● Train a model on the faces extracted from the photos/videos

● Convert your sources with the model

We first gather some photos, then run:

python faceswap.py extract

This will take photos from the src folder and extract faces into the extract folder. Then, run:

python faceswap.py train

This will take photos from two folders containing pictures of both faces and train a model that will be saved inside the models folder.

Finally, run:

python faceswap.py convert

This will take photos from the original folder and apply new faces into the modified folder. The idea is to use a face-to-face converting deep learning model you just trained and apply it to the original image.

Training Process

They even have a GUI version, which doesn’t require you to write a line of code. However, you should always remember the Manifesto:

● FaceSwap is not for creating inappropriate content.

● FaceSwap is not for changing faces without consent or with the intent of hiding its use.

● FaceSwap is not for any illicit, unethical, or questionable purposes.

● FaceSwap exists to experiment and discover AI techniques for social or political commentary, movies, and any number of ethical and reasonable uses.

The Ethical Problems of Deepfake:

Besides the usual use of Deepfake technology, it also created a lot of ethical issues that could harm both individuals and the public. A sample issue of fraud caused by Deepfake might be a scene described in the following: you received a voice message from your parents who asked you to transfer 100 thousand dollars to a new-created bank account for some emergencies. The message also explained that they recently changed their phone number. Of course, you could recognize your parents’ voices and talking styles, so you believed this message and made the transfer. However, after this, you received a second message from the same number and asked you to make another transfer, you finally grew suspicious and dialed your parents’ original number and found out that this was a fraud. Regarding this, we can see that unlike traditional phone scams, with Deepfake, the scammers can even forge the authentic voices and talking style of a person with a low cost and high quality, and make it difficult for victims to distinguish between real and fake. However, not only audio, highly synthetic videos or photos can also make people believe in something that never happened, which usually could reduce people’s vigilance and result in huge losses.

Another massive problem would be the production of synthetic pornography. The creation of the word ‘Deepfake’ actually came with made-up porn videos with actresses or actors’ faces swapped to celebrities.’ But this does not only happen to celebrities; it is also possible for ordinary citizens to be seriously affected. For instance, some may use the related techniques to retaliate against others by sabotaging their reputations, hurting people around them, or using them for blackmail.

Also, apart from identifying fake stuff from reals, there might be another issue where people start to treat real ones as fake.

In addition to personal harm, Deepfake may also cause damage to the public interest. The public may be guided intentionally by fake videos, photos, or audio, which might appeal to the public and difficult to distinguish from actual ones.

There might be other incidents brought by Deepfake that can be severely harmful and raise concerns. We believe that all these would be profoundly serious issues that people need to be aware of when understanding the concepts of Deepfake.

Anti-Deepfake:

Due to its ethical problems, people have been trying to find the techniques to fight against DeepFake. Let’s start with a quiz on Spot Deepfakes https://www.spotdeepfakes.org/

There is a Hybrid LSTM and Encoder-Decoder Architecture developed by researchers at the University of California, Riverside. The whole approach has three steps: (1) using the LSTM network, dividing the image into patches, observing these patches pixel by pixel, and extracted resampling features. The key point here is to find the distorted natural statistics at the boundary of tampered regions (2) using a convolutional encoder on the whole image. (3) using a decoder network to generate two heat maps for predicting manipulated and non-manipulated classes. Then a finer spatial map indicates the manipulated regions in an image.

Besides, Sensity(formerly Deeptrace) provides a powerful defense against weaponized deepfakes with their Detection API. Their free API can tell if the uploaded photo is fake.

Conclusion:

In this blog post, we have introduced the basic knowledge and some possible industrial applications about Deepfake. We also talked about a lot of negative consequences brought by the abuse of the use of Deepfake and the things we can do to prevent or fight against them. We hope that by exploring our writing, more people will be able to see another picture of infinite possibilities brought by technology innovation and also be aware of the perniciousness caused by improper use of technology.

References:

Faceswap-GAN. Github: <https://github.com/shaoanlu/faceswap-GAN>.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Nets: <https://papers.nips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf>

Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries: <https://arxiv.org/pdf/1903.02495.pdf>

Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, Jieping Ye. A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications: < https://arxiv.org/pdf/2001.06937.pdf>

Yuezun Li, Siwei Lyu. Exposing DeepFake Videos By Detecting Face Warping Artifacts: <https://arxiv.org/pdf/1811.00656.pdf>

--

--