How to Share the Tools to Spot Deepfakes (Without Breaking Them)

Partnership on AI
Published in
19 min readJan 13, 2022


Beware of impostors

By Claire Leibowicz (PAI), Adriana Stephan (PAI), Sam Gregory (WITNESS)

As advances in AI have enabled new tactics for generating or modifying images and videos, known as synthetic media, some fear that alongside harmless and positive uses for the technology (such as art and expressive purposes), it could be leveraged to mislead and cause harm. To respond to the potential threat of synthetic media, including both deepfakes and less sophisticated, non-AI based manipulated media known as cheapfakes or shallowfakes, many social media platforms and media institutions have turned to synthetic media detection tools to evaluate content. Such tools make use of AI algorithms to classify pieces of media as authentic or inauthentic with some degree of certainty.

Questions remain, however, about how these tools can be used to effectively promote truth online. Some of the most pressing questions ask who gets to access deepfake detectors and how. As a participant from a civil society organization observed at a recent workshop hosted by the Partnership on AI (PAI) and WITNESS: “Deepfake detectors are useless for the general public. They’re tools for arriving at the truth, but they are not presenting ground truth itself.”

Despite substantial investment in detectors, responsibly and effectively using the technology is complicated. How useful are the detectors? Can they be made available to journalists, forensic scientists, advocates, and civil society organizations around the world without making them increasingly vulnerable to evasion by bad actors? Does a qualitative signal of content being inauthentic actually help those interpreting such content judge it to be truthful or not? What resources and capacities are needed alongside detection to make it meaningful and useful? And is there any value in broad-based online detectors?

In order to untangle the complexities of deepfake detection and provide recommendations on increasing access to detectors, PAI, in collaboration with WITNESS, a nonprofit leveraging video and technology for human rights, hosted a workshop to explore the challenge of access to synthetic media detection with around 50 Partners from civil society, industry, media/journalism, and academia. While some at the workshop declared detectors “useless,” others expressed faith in their potential for dealing with misinforming manipulated media.

Here, PAI and WITNESS offer a synthesis of this workshop and previous work in deepfake detection to show how complicated this topic is. In spite of that complexity, we put forth a controlled access protocol for detection that allows detection tools to be shared meaningfully and effectively. We suggest that while imperfect and susceptible to adversarial dynamics, with responsible deployment and adequate training and support for users, detection tools can contribute to the realization of a healthier online information ecosystem that supports truthful claims and the certification of current events.

Framing the Detection Dilemma

Since 2018, PAI and WITNESS have collaborated to understand how deepfake detection can best mitigate the impact of harmful synthetic, manipulated, and misinforming content around the world, without diminishing opportunities for creative and harmless AI-generated content. Our work has emphasized both the positive and malicious use cases for synthetic content and helped promote responsible platform policies implicating synthetic media online. We’ve explicitly focused on deepfake detection in order to improve the state of the art in detection technology practices, while also emphasizing that detecting synthetic media alone does not solve information integrity challenges. As we’ve suggested, to be useful, detectors must be shared with journalists, fact-checkers, and those working in civil society around the world, while resisting adversaries generating malicious synthetic media; further, detection results must be understandable to those making sense of digital content.

As several technology platforms have attempted to create and share state of the art deepfake detectors, the question of detection access has become even more pressing: that is, who has access to detection tools and under what terms? This question must be considered in the context of the detection dilemma: the idea that the more broadly accessible detection technology becomes, the more easily it can be circumvented. In other words, detection access can trade off against detection utility, meaning how well the tools work.

When we surveyed the cohort of multidisciplinary PAI experts at the workshop, the majority of them suggested that detection access will inevitably decrease detection utility. This reality makes it more difficult, albeit no less important, to responsibly enable civil society organizations and journalists to access the tools largely concentrated in large technology organizations. Doing so would promote the idea of detection equity: how access to tools and the capacity to use them is provided equitably around the world, especially in contexts and regions where civil society, journalist, and activist stakeholders are on the frontlines of “protecting truth and challenging lies.”

The Who, What, and How of Detection Access

In the spring of 2021, PAI and WITNESS hosted two regional meetings with journalists, activists, and researchers in South America and Africa on how to meaningfully balance detection utility and equity. Building upon that, we brought the PAI community together to align on principles for releasing synthetic media detection tools.

Here, we offer a recommended level of access for deepfake detection technologies, as well as insight into the dynamics informing that recommendation and deepfake detection more generally. We also seek to inform how detection can be applied in a more global context, in a manner that provides detection capacity meaningfully, for those who most need it around the world.

Three main questions guided the meeting:

Who gets access?This involves identifying types of actors and then specific organizations and individuals.What access does the “who” have?This includes the “strength” of the detection tools each “who” can access, the types of access and training they have and who they can reach for support.How is the “who” chosen?This involves governance:  both setting up initial vetting processes and evolving as needs change.

To kick-off the meeting, we selected five experts who could speak on different facets of the deepfake detection dilemma, including a technologist, two academics, a disinformation researcher, and a journalist. Representing diverse geographic and sectoral backgrounds, Rosemary Ajayi, Chris Bregler, Siwei Lyu, Pedro Noel, and Luisa Verdoliva grounded the meeting discussions and showcased the complexity of deepfake detection for meeting participants. In later breakout sessions with the broader meeting cohort, several key themes and recommendations emerged that relate to many aspects of the initial speakers’ thoughts. Here, we offer five emergent key themes from the workshop alongside speaker quotes, further context from broader participant discussion, and PAI/WITNESS analysis.

Theme 1

Detection is imperfect and not ready for public use and interpretation without bolstered training and education on these technical systems.

Deepfake detection is opaque and imperfect, making some workshop participants question the utility of sharing deepfake detectors broadly. Participants pointed out that detection is a black box, meaning we can’t know exactly how algorithms are making judgment calls on content — a limitation that should be understood by those interpreting detector signals. Additionally, as we’ve written about previously, the datasets on which detection models are trained can affect their robustness. Datasets often inadequately represent the real world videos, as real world videos are inherently complex; for example, on social media, videos are compressed, unlike in many training sets. Further, training datasets can only include a sample of videos generated with particular techniques and won’t generalize to new methods of synthesis. Thus, many detection techniques only detect some deepfake generation techniques and will give false results on different or novel techniques, limiting their usefulness.

As Luisa Verdoliva, a media forensics researcher at the University of Naples Federico II who wrote a central research paper on synthetic media suggested at the workshop: “Detection tools are not ready to be used right now. Many of them are trained on large datasets of real and manipulated data and work as a sort of black box. The output [from a detector] is a score that indicates the probability that the image/video was manipulated and there is no further information that explains if the detector is doing well or not.”

Offering a remedy for this limitation, Verdoliva continued: “One possible solution is trying to explain and interpret the output of a detector. For example, the detector could look for specific traces — that detect inconsistencies in the shadows of the eyes — or discover if the biometric traits are consistent with those of a specific identity. Also one can find a semantic inconsistency if the speech is talking about violence and the face is expressing a happy emotion. In this way it is possible to give an explanation and understand what is happening instead of just saying it is fake or not.” Therefore, the media integrity community should strive to ensure deepfake detectors are optimally explainable and interpretable by those leveraging them in the wild, and that their limitations are known to those using them, too.

“It is possible to give an explanation and understand what is happening instead of just saying it is fake or not.”
—Luisa Verdoliva

Verdoliva also addressed the challenge of deepfake detectors being applied to contexts beyond those in which they were trained. “Lack of generalization is a major problem of supervised approaches and it is the main reason for which they cannot be easily applied in the wild,” observed Verdoliva. “Suppose for example that a model was trained on data coming from YouTube and then it is tested on Facebook videos or vice versa. Since the videos were subject to different pipelines in terms of compression, training and test are not aligned and this can cause a mismatch that makes the model not work properly. […] The same happens if training comprises a specific type of manipulation (e.g., face swapping) while the test data includes a different one (e.g., facial reenactment).” Continued research and development will improve detectors, but it is unlikely to keep pace with developments in synthetic media generation techniques.

With detection’s limitations in mind, some at our workshop suggested that we abandon emphasis on deepfake detection overall. The majority of participants, however, recognized that deepfake detection can still be one of many useful tools for dealing with information threats today, and access to such tools should be further realized around the world. This reality, however, requires adequate supplementing through other forensic techniques, as well as training and support to contextualize the limits and realities of deepfake detectors, including the lack of generalizability and their black box nature.

Theme 2

Deepfake detection is only one of many techniques for verifying and interpreting content. Provenance signals should be pursued in tandem.

Deepfake detection is one of many forensic techniques for verifying and interpreting content. Beyond the technical limitations described in the section above, public access to detectors can diminish their usefulness. Chris Bregler, Principal AI Scientist in Media Integrity at Google stated: “The most problematic issue we’re dealing with [in detection] right now is the public APIs of detectors. They’re more dangerous than useful.”

Foreshadowing one of the exercises later in the workshop, Bregler offered a suggestion for an access protocol to detection that remedied the detection dilemma: “The next thing is publishing open-source detectors with trained weights — I think that’s okay, and pushes the research field forward.” He later said, however, that “there’s no silver bullet and I don’t have a solution on what to do with public detectors.”

Beyond changes to the access protocol, though, there are additional techniques that can be leveraged for verifying and interpreting content. While detection signals describe content inauthenticity, authenticity and provenance signals can be useful, providing cues and metadata defining where and from whom content originated, as well as describing how it has been edited or manipulated. At the simplest levels, we’ve heard calls for reverse video search and similarity search, tools that (like Google’s reverse image search for static images) enable videos to be traced back to where they came from and for consumers of content to see a similar original or other versions of the artifact. This would be especially useful for the cheapfakes and mis-contextualized media that are so often leveraged to mislead. Bregler highlighted the possibilities of this in combination with enhanced public education: “The best strategy for us as a community is to continue to invest in education of the public on how to consume media, investing in provenance, and best practices for experts — i.e., how fact checkers could debunk manipulated media cases and use a detector just as one signal out of many.”

“The best strategy for us as a community is to continue to invest in education of the public on how to consume media.”
—Chris Bregler

Beyond these ideas, the PAI community has pushed for broader provenance infrastructure efforts. The Coalition for Content Provenance and Authenticity (C2PA), the Content Authenticity Initiative spearheaded by Adobe with Twitter and The New York Times, and Project Origin in the publishing domain all seek to equip end users with the capacity to trace content from its point of capture or origin to better inoculate them against misleading manipulations. Authenticity infrastructure might also help deal with the liar’s dividend, the idea that if people understand that AI can be used to synthesize images and videos, they may be more distrusting of all real and authentic content. Signaling authenticity might mitigate the impact of the liar’s dividend, which some believe to be causing more harm than actual synthetic content.

While detection will require public education to interpret the sometimes opaque results and metrics signaling inauthenticity, so too will provenance efforts. Understanding the envisioned and unintended consequences of increased context on content and how those signals are interpreted by users is vital to the usefulness of provenance standards and to prevent them causing inadvertent harms. Further, social media platforms that have been quick to cultivate improved detection techniques must consider the provenance signals that will complement detection work, both for operationalizing their own content policies and ultimately sharing content with end users.

Theme 3

Human audiences making sense of and interpreting detection signals must trust and understand them.

Many workshop participants highlighted that, even if detection was 100 percent accurate, individuals interpreting detection signals and tasked with making sense of the content of interest might not trust them. Pedro Noel, a journalist and fact-checker from Brazil, emphasized this conundrum in his opening remarks, saying, “My question is to what extent treating the distribution [of detectors] as a dilemma can be productive or not when we consider that maybe the biggest challenge is to make people believe in the outcomes and outputs [from] these tools.”

If detection technologies are concentrated in the hands of, say, the government or large technology platforms, depending on the socio-political context, those interpreting those signals might not trust them. “We see a huge problem now with mistrust and delegitimization of fact-checkers and anyone providing credible coverage, so even if these things work and they’re perfectly distributed, how can we ensure people believe that they’re outputting the right results?” said Noel. “For me, it’s important to think this dilemma is not only technical, but also ethical.”

“Even if these things work and they’re perfectly distributed, how can we ensure people believe that they’re outputting the right results?”
—Pedro Noel

Noel, like some other participants in our Spring 2021 convening with South American participants, suggested that open-sourcing the technology might remedy this ethical challenge. This is a strong viewpoint that WITNESS has heard in global contexts. While open-sourcing the technology may seem like a useful counterpoint to skepticism that detectors do not actually depict accurate metrics as well as to encourage non-commercial tool availability, this assumes that the adversarial tradeoff will be less than the gains from increased trust, and that those interpreting detection signals understand what they mean. In an example from Myanmar, a widely shared detection output from a publicly available online detector implied a political video was a deepfake, but that was likely not actually the case. When consulted, those trained in media forensics were skeptical that it was a deepfake, thinking it may have instead been a forced confession. While the end result of a forced confession or deepfake might be the same in this instance, in others, the proven authenticity of the video could lead to a different outcome than its inauthenticity. Thus, the case example demonstrates the challenges when the public and journalists rely on free online detectors without understanding the strengths and limitations of detection, or how to second-guess a misleading result.

To this end, the PAI community underscored the need to educate publics and gatekeepers of content and information in different contexts about what detection can and cannot do. With skepticism towards broader misinformation interventions linked to broad distrust in many of the institutions deploying those interventions, it’s likely that distrust in those describing detection signals might render them, even if technically robust, useless in their societal outcome. Remedying distrust in detection signals is key to their successful and effective implementation.

Theme 4

Detection and access to relevant expertise must be accessible globally, with sensitivity to the media landscape in countries and regions around the world.

Meaningful detection access does not simply mean handing a detection tool made in the Global North to institutions or individuals in the Global South. Understanding the implications of the technology in the media context of regions outside the US and UK requires increased attention to the realities of journalism, government, and activism in different regions.

Rosemary Ajayi, a disinformation researcher in Nigeria at the Digital Africa Research Lab, emphasized this reality: “While it’s key that these detection tools are accessible more broadly to individuals, groups, beyond the global north, and I speak about Africa specifically here, it’s also necessary to consider the question of how we design the selection criteria and processes and whether those criteria should look the same in all markets. Especially if we’re talking about equity here. This is top of mind for me because in Nigeria, for instance, we continue to see cases where well-meaning global north dis/misinfo initiatives parachute in and partner with news organisations and journalists because these are the kinds of stakeholders that you partner with in the UK or the US. However, the Nigerian media landscape is very complicated and some of these partnerships have ended up adding little to no value to the environment.”

“It’s necessary to consider the question of how we design the selection criteria and processes, and whether those criteria should look the same in all markets. Especially if we’re talking about equity.”
—Rosemary Ajayi

The remedy, Ajayi suggested, is devoting “resources and time to understanding the media and civic landscapes in a broad range of environments to help shape selection criteria.”

Several specific suggestions related to how to meaningfully offer non-Western individuals and entities access, how to pick the right entities to have access, and how to deepen access to expertise and training for use and understanding of the tools also emerged in regional workshops hosted with WITNESS in Africa and Latin America before the PAI workshop.

While PAI ultimately recommends controlled access to detection in this blog post, there might still be open-access options to consider in order to provide access for the most vulnerable and under-resourced communities. Drawing from his experiences in the Global South, Noel asked, “Can we really give up on distributing these tools universally (open code and use), is this actually negotiable or not? I suggest we should invest in thinking on deploying and distributing these tools as universal, and not try to restrict it.”

But if the technology is controlled and open, who should get access to the tools? The PAI community suggested that journalists and fact-checkers, societally trusted leaders, and community organizations should get access, likely with the input of contextually appropriate intermediaries like the International Fact-Checking Network (IFCN) and other networks like WITNESS and First Draft to gauge who meets these criteria. Any network that manages this high-stakes responsibility, however, must be sure to adequately understand the context of the regions around the world requiring detection and sensitivities such as those described by Ajayi. Ultimately, the potential for controlled access to lead to meaningful access to detection tools around the world is contingent upon having reliable, culturally aware intermediaries and recognizing that even within this context there will be potential exclusions and biased or compromised participants within the system.

Beyond tool access, global stakeholders will need clear access to deepened expertise and training as indicated under Theme 1, as well as access to qualified media forensics capacity and clear escalation tactics. Tools access without capacity or support to interpret detection results is likely to be counter-productive.

Global reach for these tools also requires attention on the data used to train models. Tool builders should ensure that their models are built with examples from around the world, especially from countries where synthetic media has a profound capacity to destabilize societies.

Theme 5

Graded access to deepfake detection technologies is the best route for reconciling the detection dilemma — increasing detection equity without sacrificing detection utility.

One of the goals of PAI and WITNESS’ workshop was to align on the optimal access protocol for reconciling the detection dilemma — increasing detection equity without sacrificing detection utility. At our workshop, we presented our community with a taxonomy of options for an access protocol for deepfake detection and surveyed them on their preferred access protocol, asking, “If you had to pick ONE level of access to recommend for your hypothetical deepfake detector, what would it be?” This taxonomy was derived from the deepfake detection dilemma paper and guidance from PAI’s responsible publication norms project.

  1. Complete access to source code, training data, and executable software. This provides unlimited use and the ability to construct better versions of the detector. In the case of unauthorized access, this would allow an adversary to easily determine the detector algorithm and potential blind spots which are not included in the training data.
  2. Access to software that you can run on your own computer. E.g., a downloadable app. This provides vetted actors unlimited use. In the case of unauthorized access, such an app provides adversaries the opportunity for reverse engineering, and an unlimited number of “black box” attacks which attempt to create an undetectable fake image by testing many slight variations.
  3. Detection as an open service. Several commercial deepfake detection services allow anyone to upload an image or video for analysis. This access can be monitored and revoked, but if it is not limited in some way it can be used repeatedly in “black box” fashion to determine how to circumvent that detector.
  4. Detection as a secured service. In this case the server is managed by a security-minded organization, to which vetted actors are provided access. If an adversary were to gain access to an authorized user’s account, a suspicious volume or pattern of queries can be detected and the account suspended.*
  5. Detection on demand. In this case a person or organization which does not normally do digital forensics forwards an item (escalates) to an allied group which has one of the access types described above.
  6. Developer access only. A single organization controls the unpublished technology, does not disclose the existence of the detector, and never allows external parties to infer the predictions of the detector.

* The suggestion for a graded, trusted system for deepfake detection emerged not only in this expert commentary, but was also supported by the survey. The majority of respondents chose the fourth option, “detection as a secured service.”

Before the workshop participants were surveyed, they heard expert commentary from those quoted in the earlier sections, as well as from Siwei Lyu, a Professor of Computer Science at SUNY Buffalo. Lyu described his firsthand experience of building an open portal for deepfake detection tools, saying, “We have developed a system called deepfake-o-meter, it is a way to make multiple, complementary open source tools available to users in an easily accessible way. The biggest challenge we’re facing is the possibility of being attacked, so we, in a few hours, got hundreds of requests that basically crashed the server, and this happened in the past few weeks several times.” This was without Lyu’s detector being widely promoted, but rather people figuring out that it was there.

Lyu advocated for a graded access protocol, saying: “So, right now, I think one of the simple solutions we’ve been thinking about is to have a graded, trusted user system. On the one hand, we want users to be able to use the system on an infrequent basis, (like one video evaluated once in a while). On the other hand, we also have trusted users who can have higher frequency access to the system. It is a nightmare for me, considering the server crashes down constantly.”

The suggestion — for a graded, trusted system for deepfake detection — emerged not only in this expert commentary, but was also supported by the survey. The majority of respondents chose the fourth option, detection as a secured service, from the list of options as the most useful to defend against hypothetical adversaries for deepfake detection. We phrased the question around impeding adversaries, which may have reduced responses supporting open source, which was highlighted as of interest to those in civil society and journalism we spoke to at our regional meetings with WITNESS. Of course, if detection as a secured service is adopted as the “Goldilocks” choice for detection equity and utility, fundamental questions like who gets to decide who gets access, who secures the service, and how can this be done in a trustworthy manner, per Theme 3 above, emerge. Further, it might be hard to outright prevent open detectors.

Where Do We Go From Here?

Based on qualitative workshop feedback, survey responses, and PAI work on the detection dilemma since 2019, PAI suggests detection as a secured service as the optimal access protocol as of now for reconciling detection equity and utility. In tandem, activists and civil society, fact-checkers with support, and journalists (with sensitivity to state-run media in certain authoritarian contexts) need to have access to detection tools and have mechanisms for gaining access or escalating critical cases to others with more expertise.

Of course, there is a lot of latitude in who gets access to even a secured service. While workshop participants emphasized that they wouldn’t want a government or technology company to be deciding who gets access to the technology, the ideal governance institution was not agreed upon, though some suggested an institution like the IFCN or another nonprofit, international arbiter. We also encourage escalation approaches to be built out in order to mitigate the disparities in access to forensic capabilities between countries.

Consistently, workshop participants emphasized that the robustness and utility of technical detection is only half the battle — the bigger questions are what ultimately makes people believe that a signal is accurately conveying a manipulation and what people can do once they understand that a video has been manipulated. Further, we suggest the need for bolstered training and support around detection in order to ensure that detection access does not cause harm. Several ideas for deploying training and support resources emerged from workshop participants:

  • Create a system of media forensics trainers globally, across regional contexts
  • Develop a coordinated media and information literacy campaign in multiple languages, with emphasis on localization (with local experts featured)
  • Focus simply on literacy and prevention around manipulated media
  • Training around detection technology (What kinds of artifacts are detection models picking up on when run on a deepfake video? What are the limits of black box detection models and their explainability?)
  • And, at the most extreme, share how useless detectors are for the general public and emphasize that they’re tools for arriving at truth but not ground truth itself.

With all this complexity in mind, and based on PAI and WITNESS’s synthesis of work to date (including qualitative and quantitative feedback from the PAI and WITNESS community and beyond), we recommend several key points for future work for those working across the responsible AI and information integrity fields:

  1. While detection is imperfect, it can be a useful tool and technology for mitigating the impact of malicious manipulated media.
  2. Detection should be complemented by other media verification tools, including provenance signals and infrastructure.
  3. Training, support, and education for those using detection tools are just as integral to the utility of detection as the actual robustness of the models. If interpreters are not aware of the limits of detection, as well as the meaning of such signals as only a component part to evaluating the truthfulness of content, then they will be rendered ineffective.
  4. Detection tools and technologies must be meaningfully deployed to journalists, fact-checkers, and civil society around the world, without sacrificing detection utility due to adversarial risk.
  5. Establish an infrastructure in which detection is deployed as a secure service, likely by independent nonprofits with regional sensitivity around the world, will best alleviate the detection dilemma.
  6. Encourage escalation approaches to be built out in order to mitigate the disparities in access to forensic capabilities between countries.

In 2022, PAI and WITNESS will continue considering how to bolster detection and other media verification tools and seek to work with policymakers on media integrity topics. If you have thoughts or questions, feel free to reach out to or to



Partnership on AI

The Partnership on AI is a global nonprofit organization committed to the responsible development and use of artificial intelligence.