Image: Egor Zakharov, Samsung AI Center (Moscow)

Six lessons from my deepfakes research at Stanford

How should journalists address the growing problem of synthetic media

Published in

JSK Class of 2019

12 min readMay 29, 2019

Flash back to July 2017. Brussels, Belgium. At the foreign news desk of VRT NEWS — Belgium’s public broadcaster, where I work — I watch an online video of president Barack Obama who warns viewers about the potential dangers of faked video. The video itself is fake, too, but very convincing. It is generated by researchers at the University of Washington who are using artificial intelligence tools to precisely model how the president moves his mouth when he speaks.

I’m alarmed. I start to realize that we’ve only seen the beginning of fake news. Instead of being mostly textual, we are now entering into a new fake news era of digitally altered video and audio, and this could even further erode and undermine public trust in journalism and harm democracies. I decide to apply for the John S. Knight (JSK) Journalism Fellowship at Stanford University, hoping that I can study this phenomenon and contribute to a possible solution.

Flash forward to May 2019. Stanford University. The phenomenon “deepfake,” a portmanteau of “deep learning” and “fakes,” is now widespread. The term covers not only videos and images, but also audio files generated or altered with the help of artificial intelligence, with the intent of deceiving an audience into thinking that they are real.

The word “deepfakes” originated in December 2017 with an anonymous user on the online platform Reddit who called himself “deepfakes.” He applied deep-learning algorithms to digitally superimpose faces of celebrities on actors in pornographic content. Ultimately, he got banned from Reddit. But a wave of copycats replaced him on other platforms. Experts believe there are now about 10,000 deepfake videos circulating online, and the number is growing.

As a John S. Knight Journalism Fellow at Stanford University, I approached my deepfake challenge by first studying artificial intelligence, its impact on journalism and how we can integrate these techniques in our newsrooms. This strategy gave me a chance to tap into a wealth of resources on campus, take classes on AI and meet with researchers and students. Over time, I pulled together a group of people who are also concerned about deepfakes, and who wanted to learn from each other.

Deepfake Research Team meeting in the JSK Garage (Stanford University)

I invited deep learning experts, computer science students and visiting journalists on campus to discuss this topic in the “JSK Garage,” our seminar room at Stanford. These irregular meetings grew into a slightly more formal group that we called the Deepfake Research Team (DRT), and we created a Slack workspace to share our work. Our goal: to raise awareness, foster interdisciplinary collaboration and exchange ideas about solutions to detect deepfakes.

While solutions to combating deepfakes are still far off, I’d like to share the most important lessons I have learned so far.

1. It’s becoming easier to create a deepfake

Compared to two years ago, the techniques for developing “synthetic media” are becoming better, more common and easier to use. Anyone can now create hyper-realistic deepfakes, even without much knowledge of machine learning. A search on Github for free software to develop deepfakes shows over 100 repository results. Most of them are a variant of the technique called face swap, but there are several different approaches to manipulating video, audio and images.

The deepfake technology is using generative adversarial networks (GANs) which are trained to replicate patterns, such as the face of a president, and gradually improves the realism of the synthetically generated faces. Basically, it works like a cat-and-mouse game between two neural networks.

One network, called “the generator,” is producing the fake video based on training data (real images), and the other network, “the discriminator,” is trying to distinguish between the real images and the fake video. This iterative process continues until the generator is able to fool the discriminator into thinking that the footage is real.

To create a deepfake video, it is crucial to have a powerful video card (GPU). The better result you want, the longer that you’ll have to run the computing process. If you don’t have the time or your GPU is poor, you can now simply order a deepfake video online. On YouTube, for example, it’s easy to find people who are offering their services for as little as $20 per request.

Other remarkable techniques already exist to create deepfakes :

Mirroring body movements (Berkeley University)
Transferring facial expressions with deep video portraits (Stanford University e.a.)
Erasing objects from existing video (Adobe)
Generating artificial voices based on audio samples of real people (Lyrebird)

2. You, too, can become a target of deepfakes

So far, deepfakes have been mostly used to target celebrities such as Nicholas Cage and Scarlett Johansson. The reason why celebrities are popular with deepfake creators is because there are online datasets available with the celebrities’ imagery, datasets that are being used to experiment with these techniques.

Famous examples of more political deepfakes include face swaps between president Donald Trump and German chancellor Angela Merkel or between the Argentinian president Mauricio Macri and Adolf Hitler, but they are mostly satirical and easily detectable. There is one controversial video, however, that created political turmoil in the African country Gabon: a New Year’s address by president Ali Bongo was labeled as deepfake by the opposition and sparked a (failed) military coup. Although it is still unproven that the video was actually faked, this controversy shows the potentially damaging results of deepfakes.

Unfortunately, the most imminent threat of deepfakes comes from weaponizing them against women. Deepfake creators use their faces on pornographic content without consent. This trend is better known as revenge porn and represents a degrading way of humiliating, harassing and abusing victims.

But as photo source data material is becoming abundantly available on social media (posting and tagging pictures and selfies on Facebook, Instagram, etc), anyone can become a potential target. Deepfake creators only have to download these pictures and train their models if they want to use it for identity theft, blackmailing or spreading negative news about anyone, not only politicians, CEOs or other influential people. This could be used for information warfare, misleading public opinion, manipulating stock prices or getting electoral support (Last year this happened in my home country, Belgium).

Yet, it could get worse. Imagine that only one Facebook profile picture is sufficient to create a deepfake video. Researchers at the Samsung AI Center in Moscow are already experimenting with this. They recently developed a way to create “living portraits” from a very small dataset (one picture or portrait) and generated animations from cultural icons such as Leonardo da Vinci’s Mona Lisa, Albert Einstein or Marilyn Monroe. This new algorithm goes beyond of what other algorithms using generative adversarial networks can accomplish.

There is reason to fear that newsrooms, and journalists in particular, could become a prime target for deepfake creators. Publishing an unverified hyper-realistic deepfake as source material for a news story could damage the reputation of news organizations and further undermine public faith in media institutions.

Newsrooms are particularly vulnerable in breaking news situations (for example, right before elections or in the aftermath of catastrophes, such as the fire of the Notre Dame in Paris), where bad actors could create a potential firehose of falsehood and misinformation.

It already happens that journalists are victims of personal fake news attacks. We’ve seen cases in India, but also in Belgium. Deepfakes could bring this to another level, aimed to discredit or intimidate journalists even more. Especially for television journalists and anchors this is problematic, as there is plenty of source material available to train GANs.

3. Deepfakes are becoming big business

Many of the deepfake techniques have originated in academic circles — and then have been adopted by startups making use of the tools and techniques for profit. The question is why researchers and students are developing them when they are potentially harmful to our society? This is an issue that has been central to several of our discussions at Stanford. Apart from the malicious usage, some academics think that this technology cannot be stopped, and that there are also positive aspects of deepfakes as a storytelling and content creation tool.

One example is how film production companies could apply this technology for dubbing movies in foreign languages. The British company Synthesia is utilizing AI to synchronize the lip movements of an actor to a new dialogue track; the result is a seamless experience for the viewer. Watch David Beckham, a well-known soccer celebrity, speaking in nine languages to raise awareness about malaria.

Another example is how deepfakes bring together world leaders to promote peace by singing John Lennon’s Imagine. Or even more surreal, how to bring historical figures back alive for educational purposes. Salvador Dali was recently revived as a deepfake by an art museum in Florida and now visitors can take selfies with him.

Behind these examples are companies who hope to cash in on this new technology, leveraging machine learning to do unprecedented things in the audiovisual industry. Take Gigapixel AI by Topaz Lab, a tool based on GANs that lets you enhance a very low-resolution image into a high-resolution image, up to 600 times better. It creates enormous opportunities to upscale old digital graphics, for example early ’90s video games. But this technology could also be used to enhance blurry security camera footage, which could be considered a threat to our privacy or could be abused by authoritarian regimes.

4. There is no silver bullet to detect deepfakes

Unfortunately, there are no commercially available tools to detect deepfakes yet. It is clear that any technological solution involves an AI arms race, as deepfake creators are quick to catch up with the latest detection techniques. However, this doesn’t stop Stanford researchers and students from exploring several possible solutions. We’ve invited several of them to present their tools during our Deepfake Research Team meetings. They inspired each other to collaborate on open source tools.

One fascinating approach is fighting back with tools based on the same deep learning techniques that have been used to create the deepfakes in the first place. Benjamin Penchas and Marco Monteiro introduced Sherlock AI, an automatic deepfake detection tool they developed by ensembling convolutional models that each look for anomalies in a video. They claim their solution achieves 97 percent detection accuracy on a large dataset of deepfakes, including the largest deepfake dataset FaceForensics++. They want to offer Sherlock as a Chrome extension and aim at newsrooms as primary users.

Image: NeuralHash by Nikhil Cheerla (Stanford University)

Stanford Computer Science students Nikhil Cheerla and Rohan Suri take another approach with NeuralHash, allowing content creators to embed unerasable digital watermarks in their media using deep neural networks. Malicious attackers can modify a video, distort the audio, swap in another person’s face, etcetera, but no matter what changes they make, they cannot remove the digital signature that NeuralHash burns into the content. This signature should always point back to the original, unmodified content.

5. Blockchain could be part of the solution

Probably the most promising approach to restore trust in videos could rely on blockchain technology. Blockchain is an entirely new way of documenting data on the internet. It has the ability to provide immutable and tamper-proof records of data and transactions in a decentralized distributed ledger.

Startups such as Truepic and Serelay have developed a system involving mobile apps for capturing and saving imagery to the company’s servers. Truepic uploads the whole image and uses blockchain to store metadata to ensure immutability, while Serelay computes only a unique fingerprint, which is saved in its servers. These methods look promising, but they heavily rely on trusting a third party with our data; the whole point of blockchain is to decentralize and eliminate these intermediaries.

At Stanford’s Design School, I’ve explored new blockchain applications during my winter quarter (January-March 2019). Together with students Wesley Peisch, Hesam Hamledari and Philipp Lill, we designed a prototype that allows someone to track provenance of digital imagery by providing proof of authenticity. It was important that our model was decentralized, with no intermediaries or trusted third parties.

VidProv Team at d.school (Stanford University)

We applied the design thinking methodology on how we can assist users in tracing back a video with multiple and edited versions to its origin. For this we developed a front-end decentralized application called VidProv (short for Video Provenance) to automate the authenticity process. With VidProv we aim to provide a highly secure and trusted history tracking and tracing tool that may involve multiple edited versions of the video.

Every video segment is associated with a smart contract that points to its original or parent video. If it is not traceable to a native source, then the digital content cannot be trusted. VidProv can combat deepfakes by helping users (journalists, fact-checkers and digital content consumers) to determine if a video is traceable to a trusted and reputable source. If this prototype is developed further, we believe this solution could be integrated in web browsers or video platforms to indicate the authenticity of digital content.

6. We should work more like detectives do

These technological solutions may look promising, but in the meantime, journalists and fact checkers have to rely on their human skills to discern fake images and video from real ones. The bad news is that we are not very well equipped to do this. The good news is that we can be trained.

Before people start losing their trust completely in audiovisual content, we need to think about developing new literacies for journalists, and to a larger extent, for the public as a whole. Newsrooms should develop processes on how to deal with doctored media and build a culture to train their journalists in content verification and use existing expertise from organizations such as Bellingcat and First Draft.

Basic training questions include:

Are there any glitches and inconsistencies in the video or audio?
Do you trust the source?
Can you find corroborating footage?
How can journalists examine the geo-location and other meta-data?

After all, the biggest threat today is not coming from deepfakes, but from what are called “shallowfakes”: existing video that is being recycled, mislabeled, staged or re-edited. A recent example is the doctored video of US House Speaker Nancy Pelosi that went viral and was retweeted by President Donald Trump. It has generated a big debate about social media platforms’ response to content like this.

To detect shallowfakes we can already use an excellent tool, available for free: the InVID Verification Application. It helps journalists to verify videos from social media platforms and is based on the Verification Handbook published by the European Journalism Centre. Ultimately, we need a reverse video search tool, similar to existing reverse image search tools.

So, now what?

We’ve seen a lot of headlines warning of apocalyptic consequences due to deepfakes. Fortunately, the apocalypse hasn’t materialized yet. But the dangers of deepfakes are real. They can be used to create digital wildfires. They can be used by any autocratic regime to discredit dissidents. They can be used to convince people that a dead leader is still alive. They can generate false statements.

Especially vulnerable are underdeveloped communities, where there is little or no access to mainstream media and where deepfakes are being shared in closed networks like WhatsApp. Or fragile states, where trust is already low and where people are susceptible to conspiracy theories.

But probably the biggest danger of deepfakes is disbelief by default: they call into question the veracity of real videos in order to undermine credibility and cast doubt. This can further erode trust in journalism and create havoc in societies.

Apart from technological solutions, can we combat deepfakes with legislation? Recently we’ve seen legislative initiatives in China and the U.S. But I believe we should be very cautious about legislation that is targeting deepfakes in particular, as these laws could be weaponized against journalists, dissidents or human rights defenders. Deepfakes should not be treated differently than other forms of misinformation.

While some people fear that the 2020 presidential campaign will provide a testing ground for deepfakes, I’m convinced that we should not wait for this. Instead, we should continue to raise awareness around this issue, build media literacy and prepare. Let’s find good responses before it escalates into a truly serious problem. Let’s collaborate.

If you want to work together or have recommendations of other ways to combat deep- and shallowfakes, not only video but also audio, I’d like to hear from you. E-mail me at tom.vandeweghe@stanford.edu or DM me on Twitter @tomvandeweghe.