Three Things I learned from Creating Fake Faces Using AI

Some ethical issues I realized that are relevant to both the practitioner and the common person.

Published in

The Startup

9 min readJan 3, 2020

These are fake faces generated by a computer program using Artificial Intelligence. Can you spot what makes them fake?

I have no background in Statistics nor in Computer Science. I don’t know the fancy equations used to make these faces. I don’t even completely understand how the program got these faces to look so realistic. Yet I can make tens of thousands of these fakes in just a single day and so can you.

My journey in forging fakes started a few months back, when I read about and saw researchers create fake faces using Artificial Intelligence (AI). I then tried creating my own generator. I opened up my Google Colab, wrote my code while following a tutorial, and waited patiently for my program to make fake portraits from scratch.

A collection of my burnt fake images. If Google, who owns the hardware I was using, had 48 more hours to do this, my program could have done better.

After eight hours, I had to stop the process because it was taking too long. My program had only completed a sixth of the required time, and the results were glaringly fake. I needed a convenient and inexpensive approach. Fortunately, the Internet came to the rescue. This had been done before, so all I had to do was merely copy the model provided by the original paper. I used code taken from this notebook to make my images. And using Google Colab again, the magic appeared. I took away three lessons in the course of finding datasets and models to create my fake faces, which I think are important to understand regardless of one’s background.

Running the code took ~30 seconds. There are some defects, but it’s amazing to see these fake faces in different backgrounds, poses, and facial expressions.

First, It’s Too Easy

All of this is amazing! We don’t need to be artists or Photoshop experts to create realistic fake photos. The AI overcame the problem of the uncanny valley when it learned the features of faces necessary to make convincing fakes. To do this from scratch would take weeks, but all of its components are already prepared for the public: curated and annotated training datasets are mostly publicly accessible, Google offers you free hardware to train on, and code is freely written and prepared (and sometimes neatly packaged in jupyter notebooks) for anyone to use on GitHub. All you seriously need are a phone, an internet connection, and the ability to follow instructions.

Cutting edge research is now transformed into a D.I.Y. by simply doing a Google search. Even then, if my model didn’t make good fake images or if I didn’t know how to code, I can simply go to http://thispersondoesnotexist.com to get a fake image for myself.

Cool, now I have some fake faces. What next? For starters, I can use this to forge identities and deceive people into divulging confidential information, like how this LinkedIn profile was used for espionage. I can use the same technology for pornography, by substituting faces with fake ones resembling certain people. Or most notoriously, I could combine my work with other technologies such as AI generated audio to create fake political paraphernalia to spread disinformation and commit electoral manipulation such as what influenced the 2019 coup d’état in Gabon.

And if I don’t use it for such purposes, someone will, or more accurately, someone already has.

Fake LinkedIn profile with an artificially generated fake image. Image taken from The Associated Press.

Given all these malicious applications, which have all been done, should accessibility be denied? I don’t think so. I think making these technologies accessible is currently the lesser evil than to restrict it to a select few groups of people. By having these technologies accessible, people can view its development and applications. Similarly, companies who release data that they collected can be held accountable for their handling and content of the data. Compare this to data wrapped in secrecy whose access is prevented by multiple non-disclosure agreements and severe restriction on combining with other data.

AI can enormously outperform humans in specific tasks. If AI and the data that fuels it can only be used by an exclusive group, then excluded groups will be left behind as AI shapes the world’s technological landscape. Use and development will be restricted to mainly white, college-educated males from scientific and mathematical fields. For example, the privileged educated group alone would not be able to create AI systems to address representation of an indigenous people’s community, whether that be in legal affairs or in marketing culture. They will be lacking the expertise in specific areas, which will prevent AI being applied to solve specific tasks. This is especially true for disadvantaged groups, who may have expertise on problems stemming from their first-hand experiences which are crucial in developing solutions with AI.

While making AI-powered Generators publicly accessible started its malicious uses, accessibility is also a way to develop solutions for it. With all these malicious uses, countermeasures have been developed. Fake image detection programs exist and rely on fake images as training data. Where the arms race on its applications is going is uncertain but I believe this mainly stems from the fact that it’s too early too tell.

Second, Biased Data is Dangerous

Browsing through my results, I noticed that the majority of the faces I generated are fair-skinned adults. Why is that? Why can’t it make more of other types? Simple, because it learned from the dataset. I realized to some degree that the dataset the AI relied on is biased, and in almost all applications, this is a problem.

Bias in data can take many forms: from over-, under- or, lack of representation of certain groups representing the different identities of a person. AI learns from data, and when that data is biased, it learns to exaggerate, downplay, or exclude certain factors in its predictions. In the early days of face generation, training data was mostly taken from celebrities because they had the most number of publicly available images. The fake faces generated naturally tended to have symmetrical faces free of deformities with fair skin tones. The bottom line is this: an AI system trained on biased data will reflect that bias in the results it produces.

Unless your goal is to make celebrity-like fake faces, using just celebrity faces for your training data is questionable. Image originally from NVIDIA. https://www.technologyreview.com/s/612501/inside-the-world-of-ai-that-forges-beautiful-art-and-terrifying-deepfakes/

How is this dangerous? By using biased technologies to affect large areas of society, technologies will reinforce pre-existing stereotypes against disadvantaged groups and minorities. They will experience further discrimination, and more than making people’s lives easier, AI will harm them more effectively and efficiently than humans. Take for example Amazon’s discontinued recruiting AI that learned the lack of women in the company, which made it favor male applicants over female applicants. Imagine if the bias was not detected early on, and these technologies were deployed in situations with a wider scale such as in racial profiling (such as in the US) and social services (such as China’s Social Credit System)? A biased AI will not capture the situation we want it to predict, and its inaccurate predictions will disproportionately affect the people involved. Rather than solving the problem it was originally intended for, AI will worsen it.

Finally, Data Collection is Inevitable

Images taken from another model that had more training. Notice how most artifacts from the previous sets of images are gone. These will definitely convince all except the experts.

These images reflect the data which the AI trained on. The training data, called the FFHQ Dataset made by NVIDIA, is composed of faces taken from publicly available albums from Flickr. These 70,000 images all had a non-commercial license which allowed them to be copied, and they are now used routinely by researchers, corporations, and private individuals like me around the world to train AI systems. Anyone is free to own or modify a copy, provided it be used for non-commercial purposes.

Of course, NVIDIA referenced every image used with its owners. If an owner wanted to remove their image/s out of privacy, they can request for it to be removed. But first, they have to search the dataset if it includes any image they own here. Oh — and I’m forgetting something — they also have to know this dataset exists. And I’m pretty sure that if you’re the photographer who doesn’t do or follow news on AI (which I bet most photographers are), then you wouldn’t know it exists.

A collection of real people from the FFHQ dataset. There are 70,000 of them.

I don’t see any inherent problem in this situation. After all, the owners who don’t know about this means they don’t know about any problems they might have with it. Ignorance is bliss. While NVIDIA didn’t inform owners that they took their images, the owners gave implicit consent for their images’ non-commercial use and its public accessibility.

However, I think the issue is how consent is given by using digital services. In using a lot of digital services, usually, there is little to no “opt-out” option. Data must be collected from cookies to monitor browsing habits and preferences. Heck, even a simple Google search will, if Google deems your IP address as sketchy, ask you to complete CAPTCHA puzzles that will be used for training data. I admit; many instances of data collection are necessary for personalized services and to train AI for different applications. And this is the primary reason why data is being collected from so many different sources today.

Like the subjects of the photos being unaware that their pictures are in a public dataset, or even unaware that they were photographed, data collection is always present and happening. With the further proliferation and expansion of AI technologies now and in the future, data collection will be here as long as people keep making data. And as long as people record information, there will always be data. With such widespread and inevitable data collection, it’s ultimately up to individuals to understand the different ways their data will be used. This includes scrutinizing how companies will use and secure the data, including how they would regulate its use.

But it’s always just too easy to click the “I have read and agree to the terms” without actually reading them. It’s hard to read when the terms are long and technical. It’s up to the companies to make their terms accessible. Governments will surely play some part in regulation. There has to be a change of attitude on how people view giving, and for companies to be using, their data. How will this necessary behavioral shift appear for many people? I don’t know. But for those informed about how data is used and collected, I hope they support accountability and accessibility.

AI will be here to stay, and that means we have to define our relationship to it. Regardless of whether you develop AI systems or not, I hope these three lessons give you an idea of the capabilities of AI and the constant responsibilities of humans to maintain it for ethical use. If you’re a developer, I hope you know the data you use has been entrusted to you to create AI systems that will solve a real-life problem. If you’re a user, I hope you know what your data can potentially be used for when you agree for it to be shared. And perhaps be proud to be part of a solution.

AI is meant to help, not reinforce harm.

The AI technology used to create the fake images is called a Generative Adversarial Network (GAN). Essentially, it uses at least two AIs (one to create fake images, another to determine if it’s fake or not). By pitting these AIs against each other, the creator will gradually make better images until the determiner can’t really tell the difference anymore. It’s a very interesting idea that has lead to an explosion of different applications. Check it out.

If you have any non-malicious applications for these fake faces, let me know!

Three Things I learned from Creating Fake Faces Using AI

Some ethical issues I realized that are relevant to both the practitioner and the common person.

First, It’s Too Easy

Second, Biased Data is Dangerous

Finally, Data Collection is Inevitable

Written by Ayrton San Joaquin