Identifying ‘data deficits’ can pre-empt the spread of disinformation

First Draft
First Draft Footnotes
6 min readDec 15, 2020

First Draft’s research analyst, Seb Cubbon, explores how data deficits get exploited by disinformation actors, and how we can get ahead of them.

Until very recently, mRNA (messenger ribonucleic Acid) vaccine technology and vaccine-derived poliovirus (VDPV) were still considered highly specialized, niche topics. But following the recent Pfizer and Moderna announcements, as well as the subsequent VDPV outbreaks in central Africa, these topics have suddenly risen to prominence. And their rise has been accompanied by worryingly high levels of mis- and disinformation.

We call these situations “data deficits”: where high levels of demand for information about a specific topic are not adequately matched by a supply of credible information. Unlike data voids, where search engine queries turn up little to no results, deficits are situations in which much information exists but it is misleading, confusing, false or even harmful.

These deficits are not the result of deliberate actions from bad actors. In fact, they typically occur when quality information providers are unaware of the demand for information on a given topic or are unable to provide the information in an effective, compelling manner.

However, bad actors can step in and exploit these deficits, filling them with content meant to deceive or that fits their agenda.

So how do certain data deficits get exploited by disinformation actors? How can reporters, policymakers and civil society spot them before that happens?

How data deficits can be exploited

Last summer, First Draft revealed the presence of multiple vaccine-related data deficits through our analysis of the online vaccine information ecosystem, including mRNA- and VDPV-related ones.

Since then, we found that several online messages published by sources identified as “key players in [foreign actors’] disinformation and propaganda ecosystem” exploited these deficits by incorporating them into wider disinformation and conspiratorial narratives. Their apparent aim was to undermine trust in people and institutions connected to the vaccines.

How deficits emerge, and are then exploited. Image: First Draft

These messages were then disseminated throughout the online information space thanks to a combination of laundering techniques, which are outlined below. Such techniques are frequently employed as part of disinformation campaigns to influence public discourse while obscuring the intentions and identities of the actors involved. Articles mentioned mRNA and VDPV to amplify the narratives that a) US and Western Covid-19 vaccines more broadly are unsafe “experiments” and b) Bill Gates and the institutions connected to him and the vaccines they produce are untrustworthy. These messages were then subsequently:

  1. Duplicated and translated in multiple languages by a loose and/or concentrated network of purported news websites and blogs that regularly syndicate each others’ content. The resulting multiplicity of reports artificially enhances the noteworthiness — and, by extension, the credibility — of the messages, thereby exploiting the bandwagon fallacy. The overall audience reached is also maximized as a result.
  2. Slightly modified to obfuscate the source and reduce traceability. Small changes are made to the headline and text content, and different pictures are used as the articles’ preview images. Some images can also be added into these messages, many of which tend to be more graphic and emotionally-charged. The sources quoted at the end of the article may also be changed.
  3. Spread across multiple platforms, including through simultaneous sharing “spurts” on Facebook Groups to reach a wide range of target communities within an extremely short amount of time. Some links were posted to as many as six Groups in under 40 seconds.
  4. Artificially amplified by accounts that exhibit a high number of indicators of inauthenticity.

How to spot data deficits

So how do we identify data deficits before they’re exploited? Here we offer a set of qualitative indicators that can help inform which deficits need to be addressed first. These indicators build on the quantitative indicators and research methods that First Draft has previously used to identify data deficits. If addressed proactively and with quality, accessible information, these deficits may be less likely to be exploited by malicious actors.


Is the subject new or previously unknown to a wider audience? This may mean quality information is less likely to exist or to have been disseminated in a compelling and accessible manner. Conversely, the production and distribution of bad quality but equally compelling information is a far more expedient process and may therefore benefit from a first-mover advantage.

Technical complexity

Is the topic characterized by highly-specific information whose comprehension may only come naturally to experts in the field? In this case, easily accessible information may be particularly difficult to produce. On the other hand, messages that simplify the topic in a misleading manner and incorporate it within already-popular narratives are likely to resonate with receptive audiences.

Alignment with pre-existing narratives

Does the subject demonstrate a clear potential to fit into pre-existing, long-standing disinformation narratives? If so, it may be easy to instrumentalize these topics as part of wider misleading messages aimed at exploiting fears, eroding trust and increasing polarization. For example, novel vaccine technologies (such as mRNA ones) could be used to stoke up fears over the safety of vaccines, and thereby bolster narratives portraying all vaccines as untrustworthy.

Political saliency and emotive dimension

Does the data deficit clearly fall within an emotionally-charged issue or wider topic with high political or geopolitical stakes? If so, the incentive for bad actors to exploit the data deficit with disinformation narratives may be high. The vast body of literature on information operations suggests that opportunities to sow discord, undermine democratic processes and amplify emotive tensions are more likely to be exploited by malicious actors.

Legitimate questioning

Is the topic the subject of high levels of legitimate questioning? If so, misleading explanations that address natural concerns may appeal to mainstream communities and thus reach a larger audience. Of course, this heightens incentives for malicious actors as their window of opportunity to bear a greater influence widens. While it may be difficult to distinguish legitimate from illegitimate questioning based on debunked misinformation tropes, one tangible indicator of high levels of questioning can be found using certain pre-emptive research techniques. By searching for social media posts containing interrogative phrases such as “is this true?”, “really?” and “what?” and then clustering these posts based on the similarity of the rest of their content, topics subjected to questioning and rumoring can be identified early on.

We can identify threats ahead of time

We must continuously undertake pre-emptive research that can inform proactive messaging aimed at competing with wider narratives, as opposed to just with individual pieces of content. By collecting and analyzing social media data that falls within the “middle” of social web activity — between verified media outlets or influencer accounts and those found on 4chan messaging boards, private Facebook Groups or other semi-closed anonymous online spaces — with a particular focus on identifying influential narratives, we can identify emerging data deficits early on. Qualitative indicators of data deficits can be used to prioritize responses and in turn maximize impact. The cases of mRNA technology and VDPV suggest the pre-emptive identification of key vulnerabilities is possible.

Qualitative indicators of data deficits can be used to prioritize responses and in turn maximize impact.

The upwards trend in Covid-19 vaccine misinformation and state-linked disinformation is poised to persist. Now is the time to forge greater collaboration mechanisms involving research and monitoring organizations, media outlets, subject-matter experts, platforms and policymakers to ensure data deficits can be identified early and filled with accessible, evidence-based information. Doing so can prevent the successful spread of harmful disinformation narratives.



First Draft
First Draft Footnotes

We work to protect communities across the world from harmful information.