A Short History of Deepfakes

David Song
5 min readSep 24, 2019

--

Back in early 2018, a video starring Jordan Peele and President Obama was making its way around the internet. A few of my friends sent it to me and, while something felt off about the video, I was suspiciously convinced. Later in the video, it’s revealed that Jordan Peele is doing some sort of voice over, putting the words into President Obama’s mouth, yet the movement of President Obama’s mouth and head seem natural. A concept I read about on the internet long ago immediately popped into my mind, this had to be a deepfake.

Computer vision is an intricate field primarily concerned with image and video processing-giving computers the ability to understand information from images and videos. There are already common, popular applications for computer vision technologies, things like autonomous vehicles that use computer vision to analyze road paths and obstructions, medical systems that use computer vision to diagnose patients, and of course, facial recognition used by Facebook to suggest photo tags. Computer vision is the academic umbrella that deepfakes falls under and its continued development have made deepfakes popular and more accessible.

In order to see where deepfakes came from, we need to examine the academics that laid its groundwork. In 1997, a paper written by Christoph Bregler, Michele Covell, and Malcolm Slaney developed an innovative, truly unique program that essentially automated what some movie studios could do. Video Rewrite Program could synthesize new facial animations from an audio output. It built upon older work that interpreted faces, synthesized audio from text, and modeled lips in 3D space, but was the first to put this all together and animate it convincingly.

The results from this 1997 paper are short but surprisingly convincing.

Here are three videos from the paper:

Presented in this order it’s fairly easy to determine which video is the original, but this is absolutely incredible given it was built at a time where the Oscar for best visual effects went to Independence Day.

This is one of the most important works in the development of deepfakes. In fact, many of today’s common video effects that are bundled into programs like Premiere Pro or Final Cut use upgraded algorithm philosophies from this paper.

The authors reference that this system “can be used for dubbing movies, teleconferencing, and special effects” though that has yet to be seen (as of June 2018).

The early 2000s were fairly silent as computer vision moved deeper into the facial recognition world. Developments in this field made drastic improvements to things like motion tracking that make today’s deepfakes more convincing.

Active appearance models is an algorithm that debuted in a paper from Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor in 2001. The paper was popular at the time and has retained its popularity. Using a thorough statistical model to match a shape to an image proved to be a big step forward. They made face matching and tracking significantly more efficient.

AAM finding facial parameters after 3 iterations and a poor starting position

In 2016 and 2017, two papers established deepfakes as achievable with consumer grade hardware, the Face2Face project out of the Technical University of Munich and the Synthesizing Obama project out of the University of Washington. While wholly different in the objectives they were trying to accomplish, they drastically improved computing and rendering times while updating graphical fidelity in a way to look photo-realistic.

Face2Face attempts to make a real-time animation, replacing the mouth area of it’s target video with an actors. This method doesn’t provide any audio, but there are already well developed ways for synthesizing human voice.

Synthesizing Obama is Video Rewrite 2.0 with better animations, textures, and expressions. It added wrinkles and dimples and changed colors to better match lighting and skin tone. While these graphical improvements indeed provide a more convincing model, the biggest development from this project was its ability to temporally alter both audio and video convincingly; meaning, the subject’s eyebrows move accordingly to what they’re saying. There were no longer moments where the subject would stop speaking, but their eyebrows kept moving.

The results from this project are nearly photo-realistic, and quick. A 66 second video took only 45 minutes to compute on a NVIDIA TitanX and Core i7–5820. On more common consumer hardware, it would take only a few hours.

The huge spike in deepfakes can largely be credited to Reddit and pornography, brought to bigger attention by Vice’s Samantha Cole. A now deleted subreddit appropriately named r/deepfakes had nearly 90,000 members and featured deepfake porn from a variety of actors. After its ban, Reddit updated its content policies to better reflect its position on pornography.

As of February 7 2018, we have made two updates to our site-wide policy regarding involuntary pornography and sexual or suggestive content involving minors. These policies were previously combined in a single rule; they will now be broken out into two distinct ones. Communities focused on this content and users who post such content will be banned from the site. (u/landoflobsters)

A variety of non-pornographic deepfake subreddits have since spawned, the most popular being r/SFWdeepfakes which hilariously puts Nicholas Cage’s face into as many places as possible.

There are a variety of public resources available for deepfake development. Reddit user u/deepfakes, one of r/deepfakes biggest contributors cited the Python library Keras and the Github project tensorflow as sources for his software. There are plenty more deepfake projects on Github, some containing prebuilt executables ready for immediate use. It’s easy for even an amateur to make deepfakes today with the biggest hurdle being patience. That said, the incoming efficiency gains from hardware and software development are only going to make them more prevalent.

For more reading, Samantha Cole has done a lot of great reporting about deepfakes and the damage they can do, from the initial piece about Gal Gadot to the implications for our political conversations. I’d highly recommend her work.

--

--

David Song

University of Michigan Alum // BS Economics and Music