5 Urgent Considerations for the Automated Categorization of Manipulated Media

Published in

AI&.

18 min readJun 29, 2020

By Emily Saltz (PAI), Pedro Noel (First Draft), Claire Leibowicz (PAI), Claire Wardle (First Draft), Sam Gregory (WITNESS)

“The real question for our time is, how do we scale human judgment? And how do we keep human judgment local to human situations?”
–Maria Ressa, Filipino-American journalist and founder of Rappler, speaking on “Your Undivided Attention.” (Note: Ressa was found guilty of ‘cyberlibel’ in the Philippines on June 15, 2020 for Rappler’s investigative journalism in what is seen by many as a major blow to the free press.)

What is MediaReview, and why it matters

Digital platforms have a manipulated media problem: mis/disinformation through the use of misleading videos, memes, and photographs is the most common communication strategy for political actors to influence public opinion around the world, according to a 2019 Global Inventory of Organized Social Media Manipulation report from Oxford’s Computational Propaganda Research Project. As platforms face the daunting challenge of addressing the onslaught of these media, they turn to human and technological solutions that can help them better categorize and automate this content. In response, the Duke Reporters’ Lab has created MediaReview in collaboration with fact-checking partners and other constituencies, such as schema.org and The Washington Post. MediaReview is a schema, or tagging system, that will allow fact-checkers to alert partnering tech platforms about false videos and fake images.

*Screenshot of the video rating mode of MediaReview within the FactStream interface,* *available for testing by participating publishers and fact-checkers*

At the Partnership on AI, with consultation from the Reporters’ Lab and partners in media, civil society, and technology on the AI and Media Integrity Steering Committee, we’ve been assessing the implications of developing and deploying the MediaReview schema as a case study for thinking about automating manipulated media evaluations more broadly. Rating highly contestable and context-specific media is a complex and subjective endeavor even for trained fact-checkers analyzing individual posts, and extending those ratings via automation has risks and benefits that warrant serious consideration prior to widespread adoption.

Specifically, MediaReview currently allows fact-checkers to tag video and image posts according to predefined categories such as “Missing Context,” “Edited Content,” “Transformed,” and “Authentic” in a format that can then be ingested and used by platforms to inform content moderation policies and audience-facing explanations. Google and Facebook are supporting the development of the schema, which is currently in testing with fact-checkers with an eye toward eventual adoption by platforms, who are rightly moving with urgency to address disinformation campaigns and misleading content ahead of the 2020 US Presidential election in November.

We describe urgent considerations for deploying a media rating schema across five areas: 1) Manipulated Media Categories, 2) Relationship of Ratings to Harm, 3) Archiving of Manipulated Media, 4) Automated Image and Video Matching Technology, and 5) Global User Experience Design:

1. Manipulated Media Categories: We need a set of robust categories for manipulated or misleading media.

1.1. Any set of categories should only be deployed once an inter-coder reliability threshold has been hit.

1.2. As new tactics emerge, will the categories evolve?

2. Relationship of Ratings to Harm: We need to consider the ways in which potential harm intersects with manipulated media.

2.1. If platforms are using ‘harm’ to justify decisions as outlined in many policies (flagging, de-ranking, take-downs) definitions of harm need to be clear to those fact-checking the content as well as audiences. They can not be opaque or ad-hoc.

2.2. We need clearer explanations of the ways in which harm is measured in specific local contexts, for example, the threat of immediate real-world harms, such as media inciting violence, versus longer-term harms, like anti-vaccine conspiracies.

3. Archiving of Manipulated Media: We need to consider who owns the archive of rated manipulated media, and how access can be responsibly provided to the public without unintended consequences.

3.1. Provide public archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties.

3.2. Create archives with ownership, access controls, and oversight mechanisms as informed by civil society.

4. Automated Image and Video Matching Technology: We need to consider the impact of automated matching technologies — for example, if an authentic video of police brutality is shared by one user with a misleading caption, it risks being rated as “false” across all matching videos on a platform, even when shared with accurate context.

4.1. Harm-assessment should be made at the level of a post in context, rather than as an inherent property of a media asset that can always be extended across all instances.

4.2. Fact-checkers need more context on where matches appear across platforms and should be able to understand and control how their ratings will propagate across matched media.

5. Global User Experience Design: We need to consider how these ratings might feed into audience-facing user interface interventions, across platforms, and the effects of those interventions on the public sphere.

5.1. The schema is designed for fact-checkers, not end-users. Platforms need to work together to understand the effects of displaying ratings on audience beliefs and behaviors, globally.

5.2. Users need to have opportunities to appeal manipulated media ratings.

Alongside these concerns, we outline immediate recommendations for platforms, as well as a call for platforms to transparently communicate their plans for using the schema to ensure accountability from the public and civil society.

1. Manipulated Media Categories

1.1 Any set of categories should only be deployed once an intercoder reliability threshold has been hit.

Schema development is iterative. The Reporters’ Lab rightfully recognizes that defining manipulated media categories is a process that never truly ends, only adapts to evolving media. Yet in the short-term, if the MediaReview schema is to be used as input into content moderation decisions, it is crucial to ensure that fact-checkers deciding categorization have proper training and input to ensure that any categories are both meaningfully and consistently applied.

For example, if a fact-checking organization in India tags one video of a protest as “Edited” and another in the United States would tag the same protest video as “Transformed,” that means that platforms can not reliably depend on the schema to provide a systematic understanding of the rated media for their global users. Further, these ratings say nothing of the relative harm or misleading nature of the content–evaluations that may also differ from person to person and in different geographies. Before deployment, platforms need to continue to support researchers, both internally and at institutions like the Reporters’ Lab, in order to ensure rigorous intercoder reliability testing with fact-checkers internationally on a range of mis/disinformation issues.

1.2 As new tactics emerge, will the categories evolve?

In addition, recognizing the shifting nature of these mis/disinformation categories, the schema would benefit from supporting the tagging of specific tactics beyond high-level categories like “Missing Context” and “Edited.” Many recent examples highlight the need for a schema to flexibly account for a range of evolving media manipulation techniques and their relative harms in various contexts. The Reporters’ Lab is currently accepting feedback on categories, and should build in mechanisms to continually assess categories over time in order to adapt to new tactics.

These tactics should be defined not as static, top-down categories, but rather as categories that are emergent from the bottom-up patterns observed directly on the ground by fact-checkers within local contexts, given that the intents and types of manipulations may look very different in Europe, compared to the United States, compared to the Global South, and so on. Emerging tactics include: “Lip-sync deepfakes” created for political purposes, as seen in a Belgian deepfake of Donald Trump, or translation purposes, such as an Indian politician translating speeches; “face-swapping,” as seen in a video where Jordan Peele use AI to make Barack Obama deliver a PSA about fake news; audio-splicing satire: satire that could reasonably be seen as authentic, as seen in the Bloomberg campaign’s “crickets” post; image quoting, or mis-contextualizing a quote through association with an image, as seen in numerous fake Trump quotes; and misleading subtitles, as seen in a video with a woman in Wuhan with misleading claims that she was a nurse. Currently, all of these manipulation types would be captured only through an “Original Media Context” field for describing the media in MediaReview, making structured analysis and interventions difficult.

In summary, insufficiently precise or consistent ratings become a poor “ground truth” or foundation in training data for use in automation, with the potential to lead to harmful downstream effects. These categories, in turn, should have transparent connections to proposed content moderation policies.

2. Relationship of Ratings to Harm

2.1 If platforms are using ‘harm’ to justify decisions as outlined in many policies (flagging, de-ranking, take-downs) definitions of harm need to be clear to those fact-checking the content as well as audiences. They cannot be opaque or ad-hoc.

Manipulation type alone tells you little about the harm of an instance of manipulated media in a given context. Because MediaReview categories like “Edited” do not neatly lend themselves to explicit mappings of harm or the extent to which content misleads, they may carry a dangerous implicit assumption that they are all equally harmful — an assumption whose harms would then be further amplified when ratings are automatically extended to apply to all detected instances of matching images and videos.

Consider that one of the most damaging media manipulation disinformation tactics doesn’t even require opening a photo-editing platform, such as a recent “Missing Context” example which falsely claims that a screenshot from the TV show “Designated Survivor” is a photo of protests in D.C. (example 27): a post with the potential to incite real world violence in support of an invented reality. On the other end of the spectrum, many “Transformed” examples of technically-sophisticated deepfakes pose minimal harm, such as a user’s satirical face-swapping of Jennifer Lawrence and Steve Buscemi. In other words, it is not safe to assume “the more edited, the more harmful.” However ratings play into moderation, it is clear that they require human review about distinct intents and harms of media, from satirical to malicious, within specific cultural contexts.

2.2 We need clearer explanations of the ways in which harm is measured in specific local contexts, for example, the threat of immediate real-world harms, such as media inciting violence, versus longer-term harms, like anti-vaccine conspiracies.

These examples point to a need for a much deeper investment in both defining a theory of harm about instances of both manipulated media, as well as human moderation and oversight of how those media are displayed or not displayed to end-users, adapted to specific, local contexts. Currently, platforms lack consistent and clear theories of harm for any content on their platforms, and this inconsistency would extend to MediaReview ratings.

In formulating theories of harm, platforms can turn to civil society for inspiration, such as the Rabat Principles, drafted to inform law about content that might incite hatred. The Principles outline six thresholds to assess hate speech: (i) the social and political context, (ii) the speaker, for example his or her status and influence, (iii) the intent of the speech, (iv) the content or form of the speech, (v) the extent of the speech (vi) the likelihood and imminence of actually causing harm. This sentiment has been echoed by David Kaye, UN Special Rapporteur on Freedom of Expression, who called the principles “a valuable framework for examining when the specifically defined content — the posts or the words or images that comprise the post — merits a restriction.” Additional analyses from the human rights space include Article 19’s “hate speech pyramid”, or the Dangerous Speech Project’s dangerous speech framework.

Article 19 hate speech pyramid outlining treatment of harmful content that is consistent with international human rights law

Any moderation decision around manipulated media should be considered alongside an analysis of its harms of the claims associated with the media, and not based on the type of manipulation alone. How might content moderation take both the media manipulation and claim ratings into account toward responsible policies? As platforms consider how MediaReview ratings relate to harm, it may be helpful to reference the ClaimReview ratings nested within MediaReview, which provide fact-checks for specific claims associated with the media.

3. Archiving of Manipulated Media

3.1 Provide public archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties.

Additional questions raised by MediaReview: how should MediaReview data be stored, who holds the key to that data, and who gets to benefit from it? The schema currently has fields including the posts in question, original media URLs, and articles with detailed fact-checks. MediaReview does not yet provide a way to store the images/videos that are fact-checked, only URLs. The Reporters’ Lab team is in the process of assessing storage options.

Archiving this content is a complex but crucial challenge for MediaRevia to address from a technical, ethical, and legal perspective, with many stakeholders to consider, including journalists, fact-checkers, researchers, regulators, and the general public.

*Screenshot of URL fields in the video rating mode of MediaReview within the FactStream interface,* *available for testing by participating publishers and fact-checkers*

Access to these archives is important for multiple use cases. First, lack of access affects fact-checkers, who often need to see if an item has already been checked, or need access to past fact-checked media in order to defend their own work. These access needs are often urgent and time-sensitive — for example, the need to view recycled imagery that continues to reappear and spread during recurring contexts like natural disasters, despite repeated debunking. In addition, there is a strong public interest argument for archiving the data: access to manipulated media and ratings could better equip legislators, regulators, and other third parties to respond to critical moments for information flow, such as during natural disasters, terrorist attacks, public health crises, and elections. Finally, these archives are invaluable to understanding hate speech, human rights claims, and other investigations–even when the footage is no longer publicly visible due to removal by platforms or users.

3.2 Create archives with ownership, access controls, and oversight mechanisms as informed by civil society.

There are two key elements to consider in assessing models for archives: ownership and access controls.

What is the content of the archive, and who stands to gain from access to that content? The archive will contain media with a range of harms and legality, from memes about The Cure to violent and/or sexually explicit imagery (a 2019 report Deeptrace found that 96% of the deepfake videos they identified contained pornographic content). When and how to provide access to these media should ultimately be a question assessed in coordination with human rights, archival and civil society groups with nuanced consideration of archival practices, free speech protections, privacy, consent, and additional unintended consequences.

Although uncommon, archiving services have also been misused by malicious actors, who have used it as a way to get around flagging by platforms and actually further amplify the harmful content. For example, some users who’ve had posts removed by platforms like Medium and YouTube for containing false and unsubstantiated claims about COVID-19 have found ways to re-share content through other avenues, such as Google Drive and Internet Archive links. (For their part, Internet Archive recently responded by displaying their own warning in an archived version of the article containing unsubstantiated claims.)

Yet, with appropriate access restrictions in place, the data has powerful potential to support investigations in the public interest. Some options include an embargoed archive, in which archive contents become public after a certain period of time; an “evidence locker,” an archive that is not public but accessible under certain circumstances, for example, for approved reviewers or academics; or public datasets with some amount of friction, such as email sign-up, similar to Twitter’s Transparency Report. Additionally, MediaReview can take inspiration from existing examples of how civil society organizations maintain repositories containing sensitive content that it is in the public interest, such as Mnemonic’s Syrian Archive.

In terms of ownership, there are many questions around which parties might own the archive, and the governance rules for those parties. The archive could be owned by one or several partners, decentralized so that ownership is distributed across multiple organizations, or completely decentralized, as in the Tor network where anyone is able to maintain and deploy a “Tor node”, or serverless as in the BitTorrent p2p file-sharing protocol. Regardless of the exact ownership model, it is crucial that no one party be able to monopolize and benefit from the archive’s data.

Ultimately, this is a complex question and demands considering all of the relevant stakeholders and threat models for providing access to media before making a decision. However, we believe that the data should not be monopolized by platforms, but rather either maintained in a decentralized way without a single owner, or in coordination with one or more third-parties who might serve as a responsible steward for data over the long-term.

4. Automated Image and Video Matching Technology

4.1 Harm-assessment should be made at the level of a post in context, rather than as an inherent property of a media asset that can always be extended across all instances.

From a framework of the relative harms of instances of posts containing manipulated media, it is further important to consider how those assessments might be extended through automated detection of duplicate media. Currently, platforms such as Facebook use “local feature-based instance matching” to detect all instances of images and videos across the platform and apply relevant interventions. While this technique seems to offer the potential for effective triage of false and misleading media, there are profound risks to applying interventions based on media detection alone.

*“Life cycle of images as they get matched against a database of certified misinformation” from Facebook’s AI blog*

Crucially, the matching approach is problematic because it assumes that ratings are an inherent property of media that can always be extended across all instances, rather than that ratings are applied for specific media shown in specific contexts. Even if images or videos themselves are identical, different user contexts can drastically change the meaning. As a result, when platforms automatically extend the application of a fact-check based only on a media asset, they often end up showing warnings on media shared in legitimate contexts.

This is a current problem with the potential to be extended with MediaReview’s implementation, if media ratings are used in matching processes. For example, in March 2020, a video depicting police brutality in Hong Kong in 2019 was later shared and fact-checked with a false claim related to COVID-19. The video then rippled through the platform with a “false” label, so that even when others tried to share with accurate context, it was still flagged. The “false positive” of flagging this legitimate media as false invites abuse and denial of actual events in Hong Kong.

In summary, neither a claim rating associated with media, nor the technical properties of a visual asset (e.g. pixel content) can be reliably used as a proxy for its harm across all media instances. As we’ve seen in 2.1, technical manipulation properties, such as whether a video employs face-swapping technologies vs. audio distortion, tell you little about that media’s potential to cause harm. Context detection must go hand-in-hand with duplicate media detection. Contextual, human judgments are integral to making responsible policy decisions that protect human rights and prevent abuse internationally.

4.2 Fact-checkers need more context on where matches appear across platforms, and should be able to understand and control how their ratings will propagate across matches.

If ratings are to be used to trigger interventions in concert with matching technologies, platforms need to more deeply, ethically, and systematically invest in the role of human moderators and fact-checkers to prevent displaying inappropriate interventions for duplicate media. Matching can first be used to provide humans with the results of matches across platforms in order to help them understand the context of media they’re rating. Further, it is crucial that these humans can understand and exercise localized editorial judgement in the usage of their ratings by platforms, and the social impacts in terms of how their ratings might be used to label, downrank, or remove that media on platforms.

Why is localized judgement so important? Consider that a fact-checker in Brazil reviewing content may have very different needs from the needs of a fact-checker in India, given their unique understandings of the relevant actors, media frames, and threats of the posts they’re rating. Moderation policies should take into account the different media environments of countries, and active feedback from moderators and fact-checkers can help in this evaluation.

Fact-checkers are on the frontlines of social networks: often their knowledge about the nature of the information flow in their countries, as well as in which context the information they are tagging is inserted, is deeper than platforms’ policy makers. That means fact-checkers should have an understanding of the impacts of their ratings and be able to provide the platforms with feedback on those in a way that is beneficial to both parties. Human moderators and fact-checkers should be appropriately consulted in automation decisions, and acknowledged and compensated for the profound editorial value they provide in informing these systems.

5. Global User Experience Design

5.1 The schema is designed for fact-checkers, not end-users. Platforms need to work together to understand the effects of displaying ratings on audience beliefs and behaviors, globally.

Finally, platforms need to work together to understand the effects of displaying ratings on audience beliefs and behaviors, globally. How can platforms better collaborate in conducting and sharing research about how they might display the schema to end-users in order to reduce the spread of false/misleading information across platforms? Without explicit consideration of how platforms intend to surface or not surface the schema to end-users, we risk building infrastructure that will be harder to adapt toward these ends after they’re built and deployed.

Inappropriate design choices may have the opposite of the intended effects for clarity in the public sphere. For example, in the Hong Kong police riot with a false claim related to COVID-19 was fact-checked and then displayed on Instagram with a “False” rating from Indian fact-checker Boom Live. Comments around the post note confusion about what the rating means, who the fact-checkers were, and why this was applied for what they knew to be legitimate media. This treatment negatively affects both users and fact-checkers.

As a result, for platforms who adopt the MediaReview schema, we caution against opportunistic use of these categories as UI labels without first defining and testing against goals that expressly reference societal effects, and designing interventions from a community-oriented lens. Additionally, platforms should commit to design processes that incorporate community-based design principles, and meaningfully engage external stakeholders, especially at-risk and marginalized groups. Examples of such principles include MIT’s design justice principles or BSR’s 5 steps for stakeholder engagement. For example, how might platforms systematically commit to connecting with communities through these engagement steps in order to enact principles like 10: “Before seeking new design solutions, we look for what is already working at the community level. We honor and uplift traditional, indigenous, and local knowledge and practices?” This means working directly with affected communities in order to build upon practices that they are already using to educate and protect each other from the tangible harms of mis/disinformation within specific contexts.

*Principle 10 of MIT’s 10* *design justice principles, centering people who are normally marginalized by design*

5.2 Users need to have opportunities to appeal manipulated media ratings.

Crucially, given the subjective nature of these judgments, as well as the known adversarial dynamics described above in 3.1, users should have opportunities to appeal ratings. While current appeal systems exist, they are often impersonal, opaque, and insufficiently slow to meet the needs of end-users dealing with sensitive, time-critical media.

Specific Recommendations

All of this said, we recognize the extraordinary urgency of addressing the unprecedented amount of harmful manipulated media flooding platforms. Many of the issues presented here are complex — or even intractable — problems given the current scale and diversity of users participating on platforms. And while we don’t want perfect to be the enemy of the good, we also believe there are several specific, immediate steps platforms can prioritize in order to ensure responsible deployment of a rating schema that aims to balance the risks and benefits.

These recommendations include:

Manipulated Media Categories

Invest in rigorous testing across international fact-checking organizations to understand intercoder reliability and ensure media forensics and manipulation detection capacity for fact-checkers across a variety of examples
Provide options for fact-checkers to select and suggest modifications to the schema in order to add structured data on a variety manipulation tactics, such as lip-sync deepfakes and face-swapping

Relationship of Ratings to Harm

Transparently communicate a theory of harm for manipulated media types based on international human rights guidelines

Archiving of Manipulated Media

Provide archives of rated media to enable critical inquiry, analysis, and evidentiary use by third-parties, with access controls, ownership, and oversight mechanisms as informed by civil society

Automated Image and Video Matching Technology

Provide fact-checkers with transparency and control over where and how exactly their ratings will be used by platforms across instances of media

Global User Experience Design

Before deploying interventions, test their effects on audience beliefs and behaviors, globally
Incorporate community-based design principles that meaningfully engage external stakeholders, especially at-risk and marginalized groups
Ensure that users have access to fast and reasonable appeal mechanisms to flag false positives for manipulated media

A Call for Transparent, Multi-Stakeholder Strategy

In terms of longer-term strategy, platforms need to share enough information to allow oversight from third parties to assess progress. In particular, platforms should commit to and share their plans for UX testing, moderation of categories, and any use of automated matching as it applies to MediaReview. These plans should include specifics, for example, a threshold for intercoder reliability of categories before deployment.

As the Partnership on AI continues assessing this space through its AI and Media Integrity area, we aim to help facilitate coordination between platforms in order to make progress on these issues as they evolve over time in order to ensure the responsible deployment of a manipulated media rating schema.

With thanks to: Jonathan Stray (PAI), Marlena Wisniak (PAI), Dia Kayyali (WITNESS), Tommy Shane (First Draft), Victoria Kwan (First Draft), the AI & Media Integrity Steering Committee at PAI, The Duke Reporters’ Lab