Towards Deepfake Detection That Actually Works

Dessa
Dessa News
Published in
4 min readNov 28, 2019

Authors: Rayhane Mama & Sam Shi

Read a story featuring our work on deepfake detection by Cade Metz in The New York Times.

Before we start: The codebase and data we present in this article can be found in our open source Github repository for anyone to replicate and build on. What appears below is an excerpt from a full article about our work on deepfake detection, which you can find on the Dessa website.

Link to full article: Dessa website

Download the source code and data: Github repository

Download the trained model and check out our experiments: Atlas Experiment Dashboard

Introduction

In September 2019, with the objective of improving deepfake detection, Google released a large dataset of visual deepfakes. Since then, this dataset has been used in deep learning research to develop deepfake detection algorithms. The paper and dataset is called FaceForensics++, and focuses on two particular types of deepfake techniques: facial expression and facial identity manipulation.

In the FaceForensics++ paper, the authors augmented Google’s dataset with 1000 real videos from YouTube, from which they extracted 509,914 images by applying Face2Face, FaceSwap, DeepFakes and NeuralTextures deepfake techniques.

Here’s a summary of the four techniques:

Face2Face: (facial reenactment): transfer expression from source video to target photo, using model based approach

FaceSwap: facial identity manipulation, a graphics-based approach that uses, for each frame, landmarks to create a 3D models of the source, then projects it onto the target by minimizing the distance between landmarks

Deepfakes: facial identity manipulation, first uses facial recognition to crop the face, then train two autoencoder and one shared autoencoder for source and target. To produce a deepfake, the target is run through the source autoencoder and stuck to images using Poisson image editing.

NeuralTextures: facial reenactment using GANs

The paper’s authors then fine-tuned an Xception net, pre-trained on ImageNet, to detect real vs. fake videos. The results mentioned in the paper suggest a state-of-the-art forgery detection mechanism tailored to face manipulation techniques.

Note: The FaceForensics++ inference codebase, datasets, and pre-trained models explained in the paper are open-sourced on Github.

The results shown in the table above represent the different accuracies for different models and data qualities. Each row defines a different model, whereas each column defines a different compression rate of the video. FaceForensics++ introduces the Xception net achieving state of the art of the data created in the paper.

In this article, we make the following contributions:

  1. We show that the model proposed in FaceForensics++ does not yield the same metrics stated in the paper when tested on real-life videos randomly collected from Youtube.
  2. We conduct extensive experiments to demonstrate that the datasets produced by Google and detailed in the FaceForensics++ paper are not sufficient for making neural networks generalize to detect real-life face manipulation techniques.
  3. We show the need for the detector to be constantly updated with real-world data, and propose an initial solution in hopes of solving deepfake video detection.

The Problem

The FaceForensics++ paper’s results seemed very exciting, especially after validating the reported metrics by applying the pre-trained Xception net on the paper’s data. But then we noticed a problem.

When we used the same model on real-world deepfake data encountered on Youtube (i.e.data not contained in the paper’s dataset), the accuracy of the model was much, much lower.

To test how well the model performed on real-world data, we applied the pre-trained Xception-net on the following two videos:

  • Deepfake Impressionist
  • RealTalk — a recent initiative by Dessa to generate both the physical presence and voice of Joe Rogan, as an example of hyper-realistic synthetic media

Using this model, the best result we achieved detecting fake videos from Youtube is as shown in the video below:

The model assumes that 68% of the frames in the video are real, while in reality the entire video is a deepfake.

To ensure our initial observation was fair, we randomly selected additional non-manipulated videos from Youtube, in addition to the synthetic videos discussed earlier (the Deepfake Impressionist and RealTalk videos), and tested the model on all the videos collected at this stage. The model scores an accuracy of 58%, which is very close to a random guess.

Learn how we confirmed our hypothesis and get a deep dive into our observations by reading a full version of this article on our website.

--

--

Dessa
Dessa News

A (not-so) secret AI lab at Square. Learn more at dessa.com