‘The Deepfake Report’ — how synthetic media heralds a new era of information
A new era of media, information, and technology is dawning on the horizon, and will fundamentally alter our relation with digital content.
Deepfakes — while some of you reading this right now might be familiar with the term and might even have seen the infamous Obama deepfake or a variety of altered media in the past, the vast majority of society is still unaware about the existence of deepfakes .
To some extent knowledge about the impact of deepfakes is niche and can at this stage be too easily used to manipulate, extort, defame, and defraud. Taking even further into account the current political climate and the super election year in Germany knocking on our door, deepfake unawareness and digital illiteracy in general may erode societal trust and trust in our political institutions. The lack thereof could tear at the already fragile social glue.
And while raising awareness is a huge concern that the FreeTech Academy wants to address, the reason for choosing deepfakes as our topic has a lot to do with the fact that deepfakes are much like our academy at the intersection between technology and media. So, in a lot of ways technology imitates life. And what better way to kick off a masterpiece in a unique academy such as ours than with a topic that has the potential to alter both the tech and media industry as we know it?
Let’s start with some basic knowledge about deepfakes.
So, what are deepfakes?
Deepfakes are synthetically created and digitally altered videos or audios of people depicting them saying and/or doing things they never have actually said or done.
The term itself is derived from the words “deep learning” and “fake” and stems from recent advances in machine learning. The technology is based on so-called deep neural networks (DNN) that are reminiscent of neurons in the brain. DNN’s are structured in units that are a large set of interconnected artificial neurons. Similar to neurons in the brain, each unit is tasked with a rather easy computation, but all units together perform complex operations such as recognizing objects from seeing a set of pixels on the screen. (Kietzmann et al., 2020).
To produce a faceswap deepfake, a variant of DNNs called deep auto-encoders or variational auto-encoders (VAE) are used. The auto-encoder extracts features from frames and a decoder reconstructs the frames of the face. To swap a face in a video two auto-encoder-decoder pairs are necessary where each pair is used to train on a respective image set (Nguyen et al., 2019).
Fascinating. And where did deepfakes come from?
Deepfakes are a relatively new occurrence fuelled by the deep neural network and auto-encoder development in machine learning. The first deepfakes surfaced around 2017 on a Reddit sub-thread in the form of adult film content where the faces of celebrities were swapped with the actresses faces of the adult video. In fact, most of the first deepfake content were deep nudes.
Then the origins of deepfakes are adult movies? That’s unsettling, but I don’t necessarily see how this is relevant for me?
While early deepfakes required a substantial amount of data input in form of image, audio or video, current models can generate convincing deepfakes with minimal input sources (some models now only require one video). This implies that basically anyone that posts their face on the internet can ultimately become a subject of a digitally altered video. And as most of us are ardent users of social media, anyone of us is at risk.
Moreover, deepfakes have been a very recent development which means the stage of technological sophistication and accessibility is relatively low. Currently, to produce a convincing HD deepfake still requires ample computing power, time to train models, a lot of (relevant) data, and is because of these factors costly. However, if we think about actors that do not have to care about “costs” because they have all financial means necessary to evade these limitations, this technology can easily be misused to subtly steer elections or to defame politicians of an opposing party.
With the passing of time, the technology will become more and more accessible and convincing. The barriers to produce a deepfake will be minimal and anyone with a computer might be able to synthetically produce videos that cannot be distinguished from authentic videos.
Additionally, social media will have a tremendous impact on how we will be consuming news in the future. I can imagine that social media will have to start doing actual editorial work — whether that be AI based or human based. In both cases they will have to set stricter ground rules for the upload of content on their platforms. Hesitation might mean that we start eroding the fundamental kit between us — trust.
Trust is a fundamental construct in modern society. While individuals might not be aware, trust is the reason a lot of quotidian things such as money have any value at all. We trust our governments to uphold laws, politicians to act in our interests, and each other to respect traffic rules.
The erosion of trust within our society is a very real threat posed by the misuse of deepfakes. Now, media manipulation or fake news is not a novel occurrence, quite the contrary it might actually be as old as humans themselves. However, with the advent of the internet we have become more used to putting a lot of trust in video or image content, so in content that we can verify with our very own eyes.
In the future every one of us will have to get used to being cautious and scrutinising video, audio, and image content in a completely different way even if content is reported by a trusted influencer. This might also be another pillar that will arise for journalism, as trusted outlets will become more important with a potentially constant influx of fabricated content on the internet.
So concretely, what are the threats associated with deepfakes?
One of the direst consequences of a combination of public unawareness and failing to mitigate the risks might result in individuals succumbing to reality apathy. Meaning that people may give up the attempt to verify or refute information altogether (Warzel, 2018). On an individual level anyone could become the target of online harassment, defamation, revenge porn, identity theft, and/or bullying. On an organisational level fraud and trickery will be the most imminent threats, as well as faked earnings estimates or CEOs in compromising situations (Kietzmann, 2020).
And are there any chances with regards to deepfakes?
Absolutely. The most groundbreaking chance I see is for the entertainment industry. Soon you will be able to insert yourself into your favourite movie, cast it with the people you want, and movie production might be completely transformed by making text-to-video possible. Dubbing will be another part of the industry that will see massive changes. With the ability to transform an actor’s voice into many languages, dubbing will also be a text-to-audio service in the near future.
Another industry that will definitely be impacted is the fashion industry. From virtual changing rooms to hyper-personalisation the possibilities are broad (Dietmar, 2019). People that that have lost their voice will be able to have it recreated. Real-time translation and dubbing in video conferences will further break down any language barriers and public service announcements will be made available to a variety of languages through the use of deep learning technology (Kietzmann, 2020). These are just a few examples, but there are ample opportunities in other industries not mentioned here.
Wow, okay. So, I get why I need to care now. What can be done to detect deepfakes and how can we manage the threatening sides of this technology?
While the first generation of deepfakes were easily identifiable with the sheer eye, the algorithms powering the creation of deepfake content are becoming more and more sophisticated, and thus harder to spot with conventional methods.
It takes AI to detect AI, but in an arms race on that level between synthetically generating deepfakes and AI algorithm detection, deepfake generation will always be one step ahead. That does not mean that there is not a way to mitigate the risks of deepfakes but relying on detection software alone will not be enough.
An additional, important part of the risk mitigation puzzle are digital provenance solutions and with provenance I mean, authenticating the origin of any source. This can take multiple forms such as digitally watermarking content created on any device. For people of the public sphere but in particular for politicians life-log services could provide the layer of truth to counter doubt at any given time. I see especially developers, politicians, and businesses accountable here to ensure solutions are built in this field. Politicians need to ensure funding is readily available, and developers are the ones that ultimately create provenance solutions as well as detection software.
While policy, legal and regulatory frameworks are often the last instance we should consult, more comprehensive and punishable laws are required when dealing with deepfake misconduct. And not just on a local level, but policy makers need to start building a global accord similar to the Paris agreement regarding digital misconduct and the usage of deepfakes. The accountability here for this to take place lies in the hands of are the digital ministry of every country on this planet.
And there is one more layer of the puzzle, and that is on an individual level. With the sheer magnitude of data, we upload every day on social media we are making it extremely easy for deep learning algorithms. So ultimately each and every one of us also have to reflect on the visibility and security of our data online.
Lastly, as the essential visionaries of this new development of deep learning, how and what we create with DNN’s is in the hands of developers. As developers we decide what the future of the technology holds for all of us. It is time for a global developer’s code of conduct similar to the Hippocratic oath. We may not be directly involved in saving human’s lives, but our actions have the power to impact society.
Tell us more about the project “The Deepfake Report”. Who are the people behind the scenes?
17 journalists and 12 tech students from the FreeTech Academy worked on this masterpiece in the timeframe of 6 weeks throughout March and April 2020.
You keep using the word masterpiece. What exactly is that?
A masterpiece is a collaborative project between journalists and tech students from the FreeTech Academy. They join forces for a duration of six weeks on a bi-/annual basis. The overall topic is set by the academy’s directors but the ultimate narrative meaning which way we communicate and work on the topic is left up to the journalists and the tech students. For this masterpiece we have decided to produce a documentary with A-list German politicians as well as an online platform focused on deepfake literacy where we feature additional experts in the field of synthetic media.
Working in interdisciplinary teams always bears challenges. Any you had to overcome?
One of our central challenges was to bring together contrasting working cultures. While journalists deal with a lot of pressure and strict timelines, software development isn’t a linear process. So, working with ambiguity that we as developers brought in was perhaps our pivotal challenge. And with ambiguity here I mean, that as developers we were not sure how sophisticated the deepfakes we would produce could be.
Part of that ambiguity ridden process was also the amendment of the documentary storyline, as well as the overall communication, and was vexing especially for the journalists in the academy.
Ultimately, we were able to overcome it through sheer will and collaboration, but it is also a central learning that interdisciplinarity doesn’t come naturally and needs more than the will to collaborate.
What types of deepfakes have you produced and what was your tech setup?
If further classified within the field, we have produced a so-called puppet master deepfake. A puppet master deepfake consists of a lip-synched video to the faked audio of in our case an A-list German politician. To produce the audio fake, we used an existing provider called Resemble.ai, and to lip-synch the audio fake to a given video input we used Wav2Lip model.
So how do you feel about the final product?
I think I am equal parts exhausted and proud. The DeepfakeReport team came a long way, and it didn’t go all as planned, but we persisted and produced a relevant perspective on the discussion surrounding deepfakes.
And that is perhaps the most important aspect of our work — to start this discussion. Not when this technology is already at a stage where not much can be done about it, but right now, where we still have a fighting chance to take the right steps to mitigate the threatening parts of synthetic media. Awareness is key, but I hope that you can see that it goes far beyond that. It doesn’t end with awareness; the next step is to accurately inform the public and take online literacy to the next level; in a subsequent step on a societal level we need to define accountabilities. Who is responsible for which action that needs to be undertaken? Who is responsible for producing detection software or digital provenance solutions? Who is responsible for advocating and pursuing a global synthetic media safety accord? How can regulation and policy be further refined to fit all use cases (benevolent and malevolent) of synthetic media? And there are many more questions along the same lines that need to be collected and refined. I am happy to be further involved in this discussion and open to suggestions to best approach this.
Throughout the course of this masterpiece I kept asking myself if this technology is so threatening why does it exist in the first place?
There are many answers to this question but ultimately it comes down to: “It exists because we are making tremendous progress in machine learning and it can exist now”. Besides, the technology mirrors something fundamentally human — when we think about our thoughts for a moment, many of us think in pictures or have a movie going on in their head. Who hasn’t imagined themselves as their favourite superhero? Synthetic media will potentially shift entirely how we tell what is going on in our minds and not just for content creators but for every single individual. Moreover, it might alter the fundamental nature of journalism, media, and entertainment industry altogether — videos will potentially be the primary mode of communication, text will become secondary.
For the sake of synthetic media we need a paradigm shift in its public perception, and we need it fast. Technology is never good or bad in itself. It is however deeply intentional and human, and because it is it can live up to the best of its potential much as it can to its worst. If we keep running out during a thunderstorm fearing to get struck by lightning, chances are we will be. Self-fulfilling prophecies and the words we use are powerful, and we have to be careful about the stories we iteratively tell. Much like the current narrative about deepfakes will produce an ever-growing body of treacherous, destabilising, threatening synthetic media storm that with the use of social networks will be doom of us all.
Or we start thinking about how we amplify the best use cases of this technology while mitigating its threatening sides? What would this technology look like if our primary goal is to make its use cases as positive and fruitful for society as possible. That doesn’t mean that we should neglect its threatening sides — that would be ignorance at best and naivety at worst. By using AI to detect synthetically created media, digital provenance solution via blockchain that will authenticate the origin of content, extension of legal and regulatory measures to protect possible victims of deepfakes, and ardent digital literacy education to ensure that we use a healthy grain of doubt when consuming unverified content in the near future.
The technology fuelling the creation of synthetic media can live up to be so much more if we take the right steps now and fully reframe it. A first step in reframing is to carefully choose a name for this technology. Because even something as simple as a name can ultimately shape the meaning and the direction of a technology. I propose we start using the term ‘synthetic media’ instead of ‘deepfakes’ when referring to the ever-growing body of work that will arise through synthetic content. The age of synthetic media is about to start and preparing adequately for it is in your as much as it is in my hands now.
Citron, D. K., & Chesney, R. (2018). Disinformation on Steroids: The Threat of Deep Fakes. Cyber Brief.
J. Fletcher, “Deepfakes, artificial intelligence, and some kind of dystopia: The new faces of online post-fact performance,” Theatre Journal, vol. 70,
no. 4, pp. 455–471, 2018.
Kietzmann, J., Lee, L. W., McCarthy, I. P., & Kietzmann, T. C. (2020). Deepfakes: Trick or treat?. Business Horizons, 63(2), 135–146.
Nguyen, T. T., Nguyen, C. M., Nguyen, D. T., Nguyen, D. T., & Nahavandi, S. (2019). Deep learning for deepfakes creation and detection. arXiv preprint arXiv:1909.11573, 1.
Charlie Warzel, “Infocalypse Now,” BuzzFeed News, February 11, 2018, available at https://www.buzzfeed.com/charliewarzel/the-terrifying-future-of-fake-news?utm_term=.eyVagoQY4#.taE9n0qax.