The Effect of Warning Labels on the Perceptions of Manipulated Media

Published in

Jigsaw

6 min readAug 31, 2021

Since their emergence on the internet in 2017, deepfakes — or hyper-realistic computer-generated videos that depict people saying or doing things that never actually happened — have generated discussions from policy makers, researchers, and the public about how platforms should respond. Sometimes removing the content is an easy decision, like in cases where deepfake technology is used to create non-consensual sexual imagery. However, when deepfake content involves humor, satire, or parody, the solution can be less clear.

One strategy platforms deploy to help their users identify potentially deceptive content is labeling. Prominently labeling content, the reasoning goes, aids recognition of various types of misinformation. While our research showed labels have limitations, we also found that when it comes to deepfakes, labels can make a difference. More research is needed, however, to determine what strategies are best at alerting users to deepfake misinformation.

Like many of our colleagues and collaborators, Jigsaw has been interested in exploring the ways in which deepfake technology could be abused by bad actors to harm people or disrupt open societies. In 2019, we teamed up with Google AI to create and share a dataset in order to help the research community build better technologies to detect deepfakes. Since then we’ve continued research into various misinformation intervention strategies, including against deepfakes, in order to understand how to best communicate their existence to users.

In order to advance research into deepfake interventions, we conducted a study to identify which labeling approach best alerted users to the presence of deepfakes. Using an online panel of 1,034 U.S. adults, we presented participants with one of three possible experimental conditions, each of which occurred prior to viewing a deepfake video: a warning prompt preceding the video roll, an accuracy prompt reminding users to think about accuracy, or a control condition in which participants were shown only the deepfake. We carefully selected our sample to represent U.S. internet users who were active consumers of online content. Then we asked users if they thought the video was fabricated, and if they intended to share it. Finally, participants completed a cognitive reflection test, which measures the extent to which participants are willing to engage in analytic thinking. Users were presented with results from a generic non-branded online search results page and asked to click on a web video. The video player was similarly generic.

What users saw in the warning condition. The interstitial promptly warned users of a “fabricated video ahead.”

For deepfake content, we created a composite of U.S. President Richard Nixon and comedian Walter Matthau. The video’s base footage consisted of a short clip from a Walter Matthau speech at the 1988 AFI Lifetime Achievement Awards. Nixon’s face is superimposed on Matthau’s using readily available deepfake techniques. We chose these videos for several reasons. First, we wanted to feature public figures who are deceased in order to protect people who could be unintentionally harmed by this experiment. Second, we chose these individuals because they look somewhat similar, providing a best-case scenario in creating a high-quality deepfake. Finally, using recognizable figures provided us the opportunity to create content that could be plausibly interesting to users, yet also surprising (in this case, Nixon telling jokes at an awards ceremony).

What users saw in the accuracy prompt condition, seen by approximately a third of respondents.

We found a strong and significant (t = 7.786, p = < .0001) effect on participants who were given a warning that the video was fabricated. Nearly 60% of these users responded the video was indeed fabricated, while approximately the same proportion of respondents reported the video was real in both the control (62%) and accuracy prompt condition (60%). These findings suggest labeling deepfakes can be helpful context for users, however — at least in the format tested here — labels may not provide enough information to prevent all users from being deceived. Accuracy prompts are unlikely to be fit for deepfakes, as they depend on a user’s innate ability to discern real from false. This likely is particularly true in situations where ultra-realistic deepfakes are presented to users who may lack any prior context to make good evaluative judgments over the veracity of the content.

Reported accuracy perceptions by intervention type

Participants reported no significant difference in their willingness to share the video across all three conditions, with approximately 60% of respondents reporting they were unlikely to share the video (on a 6-point likert scale; 1=extremely unlikely, 6=extremely likely). This may reflect user interest in this particular video as much as it reflects perceptions of accuracy, which is in line with prior research that suggests users use a variety of cues, including but not limited to accuracy, when deciding to share content online.

We found that scores on the Cognitive Reflection Test (CRT) were positively related to both the correct discernment of the veracity of the video as well as an unlikeliness to share. However, the relationship was stronger for sharing intentions (t = 6.685, p < .0001) than reported belief (t = 3.682, p <.0001). This is in line with prior research that suggests a willingness to engage in analytic thinking is predictive of higher levels of misinformation discernment online. However, the treatment effect (warning label or accuracy prompt) was not significantly moderated by CRT in all interactions between CRT and treatment, suggesting that while individual level differences in CRT may account for some of the variance in accuracy perceptions, the effects seen here are better explained by the interventions themselves.

This study has a number of limitations. First, these interventions were tested with only a single stimuli, the deepfake, so it is impossible to generalize across different types of content for which users each bring their unique backgrounds, knowledge, and preference biases. Second, we used a single plausible, well-produced deepfake. While this could represent a worst-case scenario for manipulated media discernment, there almost certainly is more variation in quality seen online today, even with high-quality examples. This also may be why we failed to see an effect in the accuracy prompt condition, despite the robust findings of the effect of this intervention found on other misinformation mediums. For instance, accuracy prompts may only work when users have prior experience discerning deepfakes and other forms of misinformation. In a situation where content is equally likely to be true or false, users may not have any reasonable way to make better judgments without additional signals such as their own knowledge of historic context or ability to visually inspect the video for graphical anomalies. Additionally, in this instance some participants may have benefited from greater familiarity with Richard Nixon’s or Walter Matthau’s public appearances.

Because our experiment was limited to one video, we were unable to test the potential for “backfire’’ effects, where users disengage or distrust accurate content in addition to misinformation, and whether these backfire effects can wear off over time. Additionally, while we did not ask users about their preferences in this study, it is important for any UX intervention to carefully consider the usability of any feature before launch.

Overall, this research presents reasons for optimism as well as challenges for platforms implementing misinformation interventions. Lab studies like this provide strong evidence that warnings can work in the presence of deepfakes. However, their effects may still be modest, and it is not clear yet that they would apply to all types of deepfakes or across all platforms. Despite the notification, nearly 40% of users still did not believe — or did not see — the platform’s warning. More research is needed to understand this effect, to identify opportunities to better communicate about the existence of manipulated media, and to provide users with the tools to analyze manipulated media themselves.

By Andrew Gully, Technical Research Manager, Jigsaw

The Effect of Warning Labels on the Perceptions of Manipulated Media

Written by Jigsaw