5 Urgent Considerations for the Automated Categorization of Manipulated Media

Partnership on AI
Jun 29, 2020 · 18 min read

By Emily Saltz (PAI), Pedro Noel (First Draft), Claire Leibowicz (PAI), Claire Wardle (First Draft), Sam Gregory (WITNESS)

“The real question for our time is, how do we scale human judgment? And how do we keep human judgment local to human situations?”

–Maria Ressa, Filipino-American journalist and founder of Rappler, speaking on “Your Undivided Attention.” (Note: Ressa was found guilty of ‘cyberlibel’ in the Philippines on June 15, 2020 for Rappler’s investigative journalism in what is seen by many as a major blow to the free press.)

What is MediaReview, and why it matters

Screenshot of the video rating mode of MediaReview within the FactStream interface, available for testing by participating publishers and fact-checkers

At the Partnership on AI, with consultation from the Reporters’ Lab and partners in media, civil society, and technology on the AI and Media Integrity Steering Committee, we’ve been assessing the implications of developing and deploying the MediaReview schema as a case study for thinking about automating manipulated media evaluations more broadly. Rating highly contestable and context-specific media is a complex and subjective endeavor even for trained fact-checkers analyzing individual posts, and extending those ratings via automation has risks and benefits that warrant serious consideration prior to widespread adoption.

Specifically, MediaReview currently allows fact-checkers to tag video and image posts according to predefined categories such as “Missing Context,” “Edited Content,” “Transformed,” and “Authentic” in a format that can then be ingested and used by platforms to inform content moderation policies and audience-facing explanations. Google and Facebook are supporting the development of the schema, which is currently in testing with fact-checkers with an eye toward eventual adoption by platforms, who are rightly moving with urgency to address disinformation campaigns and misleading content ahead of the 2020 US Presidential election in November.

We describe urgent considerations for deploying a media rating schema across five areas: 1) Manipulated Media Categories, 2) Relationship of Ratings to Harm, 3) Archiving of Manipulated Media, 4) Automated Image and Video Matching Technology, and 5) Global User Experience Design:

1. Manipulated Media Categories: We need a set of robust categories for manipulated or misleading media.

1.1. Any set of categories should only be deployed once an inter-coder reliability threshold has been hit.

1.2. As new tactics emerge, will the categories evolve?

2. Relationship of Ratings to Harm: We need to consider the ways in which potential harm intersects with manipulated media.

2.1. If platforms are using ‘harm’ to justify decisions as outlined in many policies (flagging, de-ranking, take-downs) definitions of harm need to be clear to those fact-checking the content as well as audiences. They can not be opaque or ad-hoc.

2.2. We need clearer explanations of the ways in which harm is measured in specific local contexts, for example, the threat of immediate real-world harms, such as media inciting violence, versus longer-term harms, like anti-vaccine conspiracies.

3. Archiving of Manipulated Media: We need to consider who owns the archive of rated manipulated media, and how access can be responsibly provided to the public without unintended consequences.

3.1. Provide public archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties.

3.2. Create archives with ownership, access controls, and oversight mechanisms as informed by civil society.

4. Automated Image and Video Matching Technology: We need to consider the impact of automated matching technologies — for example, if an authentic video of police brutality is shared by one user with a misleading caption, it risks being rated as “false” across all matching videos on a platform, even when shared with accurate context.

4.1. Harm-assessment should be made at the level of a post in context, rather than as an inherent property of a media asset that can always be extended across all instances.

4.2. Fact-checkers need more context on where matches appear across platforms and should be able to understand and control how their ratings will propagate across matched media.

5. Global User Experience Design: We need to consider how these ratings might feed into audience-facing user interface interventions, across platforms, and the effects of those interventions on the public sphere.

5.1. The schema is designed for fact-checkers, not end-users. Platforms need to work together to understand the effects of displaying ratings on audience beliefs and behaviors, globally.

5.2. Users need to have opportunities to appeal manipulated media ratings.

Alongside these concerns, we outline immediate recommendations for platforms, as well as a call for platforms to transparently communicate their plans for using the schema to ensure accountability from the public and civil society.

1. Manipulated Media Categories

1.1 Any set of categories should only be deployed once an intercoder reliability threshold has been hit.

For example, if a fact-checking organization in India tags one video of a protest as “Edited” and another in the United States would tag the same protest video as “Transformed,” that means that platforms can not reliably depend on the schema to provide a systematic understanding of the rated media for their global users. Further, these ratings say nothing of the relative harm or misleading nature of the content–evaluations that may also differ from person to person and in different geographies. Before deployment, platforms need to continue to support researchers, both internally and at institutions like the Reporters’ Lab, in order to ensure rigorous intercoder reliability testing with fact-checkers internationally on a range of mis/disinformation issues.

1.2 As new tactics emerge, will the categories evolve?

These tactics should be defined not as static, top-down categories, but rather as categories that are emergent from the bottom-up patterns observed directly on the ground by fact-checkers within local contexts, given that the intents and types of manipulations may look very different in Europe, compared to the United States, compared to the Global South, and so on. Emerging tactics include: “Lip-sync deepfakes” created for political purposes, as seen in a Belgian deepfake of Donald Trump, or translation purposes, such as an Indian politician translating speeches; “face-swapping,” as seen in a video where Jordan Peele use AI to make Barack Obama deliver a PSA about fake news; audio-splicing satire: satire that could reasonably be seen as authentic, as seen in the Bloomberg campaign’s “crickets” post; image quoting, or mis-contextualizing a quote through association with an image, as seen in numerous fake Trump quotes; and misleading subtitles, as seen in a video with a woman in Wuhan with misleading claims that she was a nurse. Currently, all of these manipulation types would be captured only through an “Original Media Context” field for describing the media in MediaReview, making structured analysis and interventions difficult.

In summary, insufficiently precise or consistent ratings become a poor “ground truth” or foundation in training data for use in automation, with the potential to lead to harmful downstream effects. These categories, in turn, should have transparent connections to proposed content moderation policies.

2. Relationship of Ratings to Harm

2.1 If platforms are using ‘harm’ to justify decisions as outlined in many policies (flagging, de-ranking, take-downs) definitions of harm need to be clear to those fact-checking the content as well as audiences. They cannot be opaque or ad-hoc.

Consider that one of the most damaging media manipulation disinformation tactics doesn’t even require opening a photo-editing platform, such as a recent “Missing Context” example which falsely claims that a screenshot from the TV show “Designated Survivor” is a photo of protests in D.C. (example 27): a post with the potential to incite real world violence in support of an invented reality. On the other end of the spectrum, many “Transformed” examples of technically-sophisticated deepfakes pose minimal harm, such as a user’s satirical face-swapping of Jennifer Lawrence and Steve Buscemi. In other words, it is not safe to assume “the more edited, the more harmful.” However ratings play into moderation, it is clear that they require human review about distinct intents and harms of media, from satirical to malicious, within specific cultural contexts.

2.2 We need clearer explanations of the ways in which harm is measured in specific local contexts, for example, the threat of immediate real-world harms, such as media inciting violence, versus longer-term harms, like anti-vaccine conspiracies.

These examples point to a need for a much deeper investment in both defining a theory of harm about instances of both manipulated media, as well as human moderation and oversight of how those media are displayed or not displayed to end-users, adapted to specific, local contexts. Currently, platforms lack consistent and clear theories of harm for any content on their platforms, and this inconsistency would extend to MediaReview ratings.

In formulating theories of harm, platforms can turn to civil society for inspiration, such as the Rabat Principles, drafted to inform law about content that might incite hatred. The Principles outline six thresholds to assess hate speech: (i) the social and political context, (ii) the speaker, for example his or her status and influence, (iii) the intent of the speech, (iv) the content or form of the speech, (v) the extent of the speech (vi) the likelihood and imminence of actually causing harm. This sentiment has been echoed by David Kaye, UN Special Rapporteur on Freedom of Expression, who called the principles “a valuable framework for examining when the specifically defined content — the posts or the words or images that comprise the post — merits a restriction.” Additional analyses from the human rights space include Article 19’s “hate speech pyramid”, or the Dangerous Speech Project’s dangerous speech framework.

Article 19 hate speech pyramid outlining treatment of harmful content that is consistent with international human rights law

Article 19 hate speech pyramid outlining treatment of harmful content that is consistent with international human rights law

Any moderation decision around manipulated media should be considered alongside an analysis of its harms of the claims associated with the media, and not based on the type of manipulation alone. How might content moderation take both the media manipulation and claim ratings into account toward responsible policies? As platforms consider how MediaReview ratings relate to harm, it may be helpful to reference the ClaimReview ratings nested within MediaReview, which provide fact-checks for specific claims associated with the media.

3. Archiving of Manipulated Media

3.1 Provide public archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties.

Archiving this content is a complex but crucial challenge for MediaRevia to address from a technical, ethical, and legal perspective, with many stakeholders to consider, including journalists, fact-checkers, researchers, regulators, and the general public.

Screenshot of URL fields in the video rating mode of MediaReview within the FactStream interface, available for testing by participating publishers and fact-checkers

Access to these archives is important for multiple use cases. First, lack of access affects fact-checkers, who often need to see if an item has already been checked, or need access to past fact-checked media in order to defend their own work. These access needs are often urgent and time-sensitive — for example, the need to view recycled imagery that continues to reappear and spread during recurring contexts like natural disasters, despite repeated debunking. In addition, there is a strong public interest argument for archiving the data: access to manipulated media and ratings could better equip legislators, regulators, and other third parties to respond to critical moments for information flow, such as during natural disasters, terrorist attacks, public health crises, and elections. Finally, these archives are invaluable to understanding hate speech, human rights claims, and other investigations–even when the footage is no longer publicly visible due to removal by platforms or users.

3.2 Create archives with ownership, access controls, and oversight mechanisms as informed by civil society.

What is the content of the archive, and who stands to gain from access to that content? The archive will contain media with a range of harms and legality, from memes about The Cure to violent and/or sexually explicit imagery (a 2019 report Deeptrace found that 96% of the deepfake videos they identified contained pornographic content). When and how to provide access to these media should ultimately be a question assessed in coordination with human rights, archival and civil society groups with nuanced consideration of archival practices, free speech protections, privacy, consent, and additional unintended consequences.

Although uncommon, archiving services have also been misused by malicious actors, who have used it as a way to get around flagging by platforms and actually further amplify the harmful content. For example, some users who’ve had posts removed by platforms like Medium and YouTube for containing false and unsubstantiated claims about COVID-19 have found ways to re-share content through other avenues, such as Google Drive and Internet Archive links. (For their part, Internet Archive recently responded by displaying their own warning in an archived version of the article containing unsubstantiated claims.)

Internet Archive warning flag

Yet, with appropriate access restrictions in place, the data has powerful potential to support investigations in the public interest. Some options include an embargoed archive, in which archive contents become public after a certain period of time; an “evidence locker,” an archive that is not public but accessible under certain circumstances, for example, for approved reviewers or academics; or public datasets with some amount of friction, such as email sign-up, similar to Twitter’s Transparency Report. Additionally, MediaReview can take inspiration from existing examples of how civil society organizations maintain repositories containing sensitive content that it is in the public interest, such as Mnemonic’s Syrian Archive.

In terms of ownership, there are many questions around which parties might own the archive, and the governance rules for those parties. The archive could be owned by one or several partners, decentralized so that ownership is distributed across multiple organizations, or completely decentralized, as in the Tor network where anyone is able to maintain and deploy a “Tor node”, or serverless as in the BitTorrent p2p file-sharing protocol. Regardless of the exact ownership model, it is crucial that no one party be able to monopolize and benefit from the archive’s data.

Ultimately, this is a complex question and demands considering all of the relevant stakeholders and threat models for providing access to media before making a decision. However, we believe that the data should not be monopolized by platforms, but rather either maintained in a decentralized way without a single owner, or in coordination with one or more third-parties who might serve as a responsible steward for data over the long-term.

4. Automated Image and Video Matching Technology

4.1 Harm-assessment should be made at the level of a post in context, rather than as an inherent property of a media asset that can always be extended across all instances.

“Life cycle of images as they get matched against a database of certified misinformation” from Facebook’s AI blog

Crucially, the matching approach is problematic because it assumes that ratings are an inherent property of media that can always be extended across all instances, rather than that ratings are applied for specific media shown in specific contexts. Even if images or videos themselves are identical, different user contexts can drastically change the meaning. As a result, when platforms automatically extend the application of a fact-check based only on a media asset, they often end up showing warnings on media shared in legitimate contexts.

This is a current problem with the potential to be extended with MediaReview’s implementation, if media ratings are used in matching processes. For example, in March 2020, a video depicting police brutality in Hong Kong in 2019 was later shared and fact-checked with a false claim related to COVID-19. The video then rippled through the platform with a “false” label, so that even when others tried to share with accurate context, it was still flagged. The “false positive” of flagging this legitimate media as false invites abuse and denial of actual events in Hong Kong.

In summary, neither a claim rating associated with media, nor the technical properties of a visual asset (e.g. pixel content) can be reliably used as a proxy for its harm across all media instances. As we’ve seen in 2.1, technical manipulation properties, such as whether a video employs face-swapping technologies vs. audio distortion, tell you little about that media’s potential to cause harm. Context detection must go hand-in-hand with duplicate media detection. Contextual, human judgments are integral to making responsible policy decisions that protect human rights and prevent abuse internationally.

4.2 Fact-checkers need more context on where matches appear across platforms, and should be able to understand and control how their ratings will propagate across matches.

Why is localized judgement so important? Consider that a fact-checker in Brazil reviewing content may have very different needs from the needs of a fact-checker in India, given their unique understandings of the relevant actors, media frames, and threats of the posts they’re rating. Moderation policies should take into account the different media environments of countries, and active feedback from moderators and fact-checkers can help in this evaluation.

Fact-checkers are on the frontlines of social networks: often their knowledge about the nature of the information flow in their countries, as well as in which context the information they are tagging is inserted, is deeper than platforms’ policy makers. That means fact-checkers should have an understanding of the impacts of their ratings and be able to provide the platforms with feedback on those in a way that is beneficial to both parties. Human moderators and fact-checkers should be appropriately consulted in automation decisions, and acknowledged and compensated for the profound editorial value they provide in informing these systems.

5. Global User Experience Design

5.1 The schema is designed for fact-checkers, not end-users. Platforms need to work together to understand the effects of displaying ratings on audience beliefs and behaviors, globally.

Inappropriate design choices may have the opposite of the intended effects for clarity in the public sphere. For example, in the Hong Kong police riot with a false claim related to COVID-19 was fact-checked and then displayed on Instagram with a “False” rating from Indian fact-checker Boom Live. Comments around the post note confusion about what the rating means, who the fact-checkers were, and why this was applied for what they knew to be legitimate media. This treatment negatively affects both users and fact-checkers.

As a result, for platforms who adopt the MediaReview schema, we caution against opportunistic use of these categories as UI labels without first defining and testing against goals that expressly reference societal effects, and designing interventions from a community-oriented lens. Additionally, platforms should commit to design processes that incorporate community-based design principles, and meaningfully engage external stakeholders, especially at-risk and marginalized groups. Examples of such principles include MIT’s design justice principles or BSR’s 5 steps for stakeholder engagement. For example, how might platforms systematically commit to connecting with communities through these engagement steps in order to enact principles like 10: “Before seeking new design solutions, we look for what is already working at the community level. We honor and uplift traditional, indigenous, and local knowledge and practices?” This means working directly with affected communities in order to build upon practices that they are already using to educate and protect each other from the tangible harms of mis/disinformation within specific contexts.

Principle 10 of MIT’s 10 design justice principles, centering people who are normally marginalized by design

5.2 Users need to have opportunities to appeal manipulated media ratings.

Specific Recommendations

All of this said, we recognize the extraordinary urgency of addressing the unprecedented amount of harmful manipulated media flooding platforms. Many of the issues presented here are complex — or even intractable — problems given the current scale and diversity of users participating on platforms. And while we don’t want perfect to be the enemy of the good, we also believe there are several specific, immediate steps platforms can prioritize in order to ensure responsible deployment of a rating schema that aims to balance the risks and benefits.

These recommendations include:

Manipulated Media Categories

  • Invest in rigorous testing across international fact-checking organizations to understand intercoder reliability and ensure media forensics and manipulation detection capacity for fact-checkers across a variety of examples
  • Provide options for fact-checkers to select and suggest modifications to the schema in order to add structured data on a variety manipulation tactics, such as lip-sync deepfakes and face-swapping

Relationship of Ratings to Harm

  • Transparently communicate a theory of harm for manipulated media types based on international human rights guidelines

Archiving of Manipulated Media

  • Provide archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties, with access controls, ownership, and oversight mechanisms as informed by civil society

Automated Image and Video Matching Technology

  • Provide fact-checkers with transparency and control over where and how exactly their ratings will be used by platforms across instances of media

Global User Experience Design

  • Before deploying interventions, test their effects on audience beliefs and behaviors, globally
  • Incorporate community-based design principles that meaningfully engage external stakeholders, especially at-risk and marginalized groups
  • Ensure that users have access to fast and reasonable appeal mechanisms to flag false positives for manipulated media

A Call for Transparent, Multi-Stakeholder Strategy

In terms of longer-term strategy, platforms need to share enough information to allow oversight from third parties to assess progress. In particular, platforms should commit to and share their plans for UX testing, moderation of categories, and any use of automated matching as it applies to MediaReview. These plans should include specifics, for example, a threshold for intercoder reliability of categories before deployment.

As the Partnership on AI continues assessing this space through its AI and Media Integrity area, we aim to help facilitate coordination between platforms in order to make progress on these issues as they evolve over time in order to ensure the responsible deployment of a manipulated media rating schema.

With thanks to: Jonathan Stray (PAI), Marlena Wisniak (PAI), Dia Kayyali (WITNESS), Tommy Shane (First Draft), Victoria Kwan (First Draft), the AI & Media Integrity Steering Committee at PAI, The Duke Reporters’ Lab

AI&.

Advancing Responsible AI

Partnership on AI

Written by

The Partnership on AI is a global nonprofit organization committed to the responsible development and use of artificial intelligence.

AI&.

AI&.

A publication to highlight the practice of developing and ensuring AI technologies are ethical, transparent, and inclusive for the benefit of people and society

Partnership on AI

Written by

The Partnership on AI is a global nonprofit organization committed to the responsible development and use of artificial intelligence.

AI&.

AI&.

A publication to highlight the practice of developing and ensuring AI technologies are ethical, transparent, and inclusive for the benefit of people and society

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store