Deepfakes

Maybe they aren’t all that bad?

Devjit Kanjilal
Margin_Squeeze
6 min readAug 15, 2020

--

Deepfakes via The New York Times

I recently watched a video from the New York Times on the topic of Deepfakes.

Deepfakes (a portmanteau of “deep learning” and “fake”[1]) are synthetic media[2] in which a person in an existing image or video is replaced with someone else’s likeness.

Deepfakes are something that we hear more and more about on the news, and the New York Times does a great job of discussing how the intent of a Deepfake is not necessarily something new, but a method of misinformation that has always existed. What Deepfakes do today is the same as what Shallowfakes (less advanced misinformation) have done before— it’s just the technology has advanced.

Google Trends for Deepfakes

Deepfakes are not all bad news, and I want to use this issue of the Squeeze to talk about the technology, discuss how we can use Deepfakes to help combat existing biases, while also introducing that Deepfakes are not just in video/images.

But before we dive into the thick of things, some helpful context:

Machine Learning is how we get computers to learn from experience. For example, rather than direct a remote control car — the car will direct itself based on past movements.

Neural Networks are computer systems designed to mimic the brain. While not a perfect analogy, you could consider machine learning with neural networks as deep learning.

Facial Recognition is a technology capable of identifying a person using images/videos of their face

GPT3 is a large transformer-based language model with 175 billion parameters, trained on a dataset of over a trillion pieces of text. GPT-3 is trained with a simple objective: predict the next word, given all of the previous words within some text. Or in English, fancy models and lots of text data allow for prediction on words and sentences

How Deepfakes work

Machine learning can be used to create Deepfakes, but Deepfakes hit a new level of sophistication when a Doctoral student in Montreal went out for beers. Now known as the GANfather, a GAN is the technology that backs the sophistication of many of the Deepfakes that we see today.

I could watch this on loop

Deepfakes use Generative Adversarial Networks (GANs), in which two machine learning (ML) models compete against each other to generate data. One ML model trains on a data set and then outputs countless video forgeries, while the other model attempts to detect these forgeries over a potentially neverending game of Sherlock Holmes and Moriarty until fakes that can’t be detected are created. For those really curious that want to build their own Deepfakes, The GitHub repo for Deep Face Lab is a cool reference point.

As with any machine learning approach, lots of data is needed, and the larger the set of training data, the easier it is for the forger to create a believable Deepfake.

Computing power has increased exponentially over these past few years as graphical processor units (GPUs) become the norm, and this accelerated Moore’s Law coupled with an expanding library of video data (Youtube, movies, etc) have acted as catalysts in the rise of Deepfakes.

By Max Roser — https://ourworldindata.org/uploads/2019/05/Transistor-Count-over-time-to-2018.png, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=79751151

GPT3 and Deepfakes

If GANs are the Deepfake for videos, GPT3 can be considered as the Deepfake for text. GPT3 was released in July this year as a successor to the impressive GPT2.

OpenAI caused controversy a year ago when it said it would not release the source code to the biggest version of GPT-2, because, it said, that code could fall into the wrong hands and be abused to mislead people with things such as fake news.

GPT-3 is a deep learning model that is even more advanced than GPT2, and can generate text that is nearly indistinguishable from what humans create.

GPT3 Text Sample — Real-like

Reading this article, I would not have known it was generated by a computer, so while Deepfakes are often thought of as videos/pictures — other implications such as Deepfakes in text format should be considered. On a funnier note, the excerpt below shows that GPT3 does also have the flair for the surreal.

GPT3 Text Sample — Surreal

Deepfakes for good

It’s not all bad news, and I think one important value that Deepfakes both visual and textual can bring is the removal of biases. Facial recognition is a great example of a space where we can use Deepfakes to remove biases. Facial Recognition works in 5 key steps.

Step 1. An image of your face is captured from a photo or video. This picture doesn’t have to be perfect and Your face might appear alone, in a crowd, obstructed, or even distorted.

Step 2. Facial recognition software reads the image and breaks it down into smaller chunks. For example, before even looking at a face- the software must locate the humans, and only then can it identify the face. This phased process keeps the process efficient.

Step 3. Once the location of the face is located, the software then reads the geometry of your features such as the distance between your eyes and other defining features. At this point, your face is no longer an image to the software but a series of digits referencing a table of values that refers to the mathematical equivalent of your face.

Step 4. This mathematical series of digits is then compared to a database of known faces. Databases vary and could be large datasets such as police databases, or could be small and could be made by anyone. Clearview AI did this by scraping the internet.

Step 5. A determination is then made on the series of digits based on threshold and rules. For example, if my face matches with 85% probability to a known picture of my face — the model will identify the picture as that of my face.

This seems pretty reasonable as this is basically how your brain works, but if the facial recognition model does not “see the world” and is trained on specific datasets, it can become just as biased as crazy uncle Jim from yearly Thanksgiving dinner.

This is a very real problem as the lack of governance around the matter only makes it worse. For example, if a dataset of 100 items is split into Class A with 90 items and 10 of Class B. Because of poor representation by Class B, the accuracy for the prediction of that class may be low — there just isn’t much data. On the other hand, Class A has so much data — the model could bias the other way and predict at a high false-positive rate for Class A. Now imagine class A as gender or even a race and Class B as the counterpart?

A 2011 study, co-authored by one of the organizers of NIST’s vendor tests, found that algorithms developed in China, Japan, and South Korea recognized East Asian faces far more readily than Caucasians. The reverse was true for algorithms developed in France, Germany, and the United States, which were significantly better at recognizing Caucasian facial characteristics.

Now, of course, it can be argued that the use of accuracy for these technologies is a poor performance measure and that precision in combination with other metrics should be used — but either way, bias exists.

However, now imagine if the Class A and B dataset mentioned above was augmented by 90 Deepfakes of Class B so that there was an even split of 100 items for each class- would the model predict better? Some research says yes.

When it comes to GPT3 and Deepfakes in text, this article on Medium does a great job in discussing how biases also exist here. While I don’t know the answer, I think that there exists potential to use models such as GPT3 to instill checks and balances in processes such as University Admissions to create joint pools of real & fake applications to create a test-control admission process and avoid what is currently unfolding at Yale.

In summary, with all the news around Deepfakes it is important to realize

  1. How a Deepfake works
  2. They are not just Videos/Images
  3. Biases exist
  4. Deepfakes can be used to combat bias and do good
  5. The New York Times does some great reporting
Arnold Hader

--

--