The covid infodemic and its control: the role of AI systems

16 min readJun 18, 2020

Alistair Knott and Colin Gavaghan

Centre for AI and Public Policy, University of Otago, New Zealand

As the covid epidemic progresses through the world, a second epidemic is following hot on its heels. Rather than physical disease, this one involves the transmission of dubious, confusing or harmful information. While covid is passed from person to person through coughs and physical contact, the medium for the information epidemic is the Internet. Increasingly, when searching online, chatting on social media or checking the daily feed, people are encountering opinions that are odd, or provocative, or downright bizarre. The coronavirus is a hoax; it originated as a bioweapon in Canada; it was released by Bill Gates, as cover for his plan to implant people with microchips. If someone coming across such a claim is drawn to investigate further, they will readily find a whole Internet culture that substantiates and reinforces such beliefs. Exploring these unsuspected bubbles of opinion can often strike people with the force of a revelation, creating converts and evangelists, who share the opinion further, reporting it in their own way, with their own elaborations. The most surprising or extreme elaborations are passed on more readily, and extend the bubble, or create new bubbles.

Information has always travelled around the world this way. Neighbours gossip with one another, news articles trigger other news articles, and these processes have always created pockets of bizarre or extreme opinion. What’s different about the Internet is how fast and efficiently it allows these bubbles of opinion to form. On the Internet, one person can communicate with a much wider group of people. And mass communication methods are subject to far fewer constraints than traditional media. A newspaper has an editorial board, and TV programmes must comply with broadcasting standards; the analogues of these checks for tech companies are still very much on the drawing board. In epidemiological terms that everyone now readily understands, when it comes to the transmission of odd, false and extreme information, the Internet has a vastly higher R-number than traditional media institutions. And this is why we see an ‘infodemic’ on the Internet, in response to the covid crisis.

In this post, we will outline the shape of the covid infodemic that’s playing out on the Internet, and review options for how to respond. Our focus will be on technological responses: on what companies like Facebook and Google are doing and can do, and on how society should oversee their response.

The shape of the covid infodemic

The information of most direct relevance to the covid epidemic is health information: information about how the virus works, how to protect ourselves from it, and how to prevent its spreading. A range of perspectives on these topics can be found online, running from mainstream international science (the WHO, established health journals), official government advice, mainstream media sources, through alternative health providers, to purveyors of worthless health products, all the way to those disseminating positively harmful suggestions. There are still extensive debates about the best way to treat or prevent covid, but we can be relatively confident about the extremes of this spectrum. We want credible, trustworthy advice to propagate on the Internet, and not useless or harmful pseudoscience. At the same time, we want to allow space for authoritative scientific opinions to be questioned. In recent weeks, several of the world’s most reputable medical journals have retracted important studies that had significant impacts for the Covid response, as a result of independent scrutiny: this process is an essential part of normal science.

Dealing with covid is also a political issue. The key question here is how severe the social isolation mandated by government should be, and how long it should last. Again, different opinions on these questions proliferate on the Internet. Emotions run high here, especially for people suffering financially from lockdown, or those who strongly value the rights of the individual over those of the collective or vice versa. The high emotions brought on by covid, both online and offline, create a tinderbox for social unrest. In the US, the covid crisis lent extra force to the protests about racial discrimination triggered by the murder of George Floyd. These protests are exerting their own influence on the dynamics of online opinion. In the clusters of opinion that are emerging, there are some odd tensions. Those who protested against a strong, centralised government approach to covid lockdown often also advocate a strong, centralised government response to the George Floyd demonstrations. Many of those who advocated a strict lockdown are now calling people out onto the streets.

Politics, science and medicine are also blending online in new ways during covid, in a strange new ecosystem of conspiracy theories. In the US, conspiracy theories about the origin of covid are being used by supporters of the government, to deflect attention away from its failure to control the disease. The president himself is front and centre in this process, speaking directly to an audience of millions, with messages that are inflammatory, divisive and factually incorrect. To sell a conspiracy theory, it is helpful to question the credibility of traditional scientific and medical institutions, and of the traditional media. In the covid crisis, the WHO has been a particular focus for Trump, and for other authoritarian and populist leaders too. But conspiracy theories aren’t just for authoritarians. To take another example, covid is also pushing the alternative health community towards some strange places. This community was hit hard by covid: alternative health practitioners had to stop work, while mainstream medicine went into overdrive, and traditional media outlets have been distinguishing sharply between mainstream and alternative medical advice. The resulting feeling of persecution has created fertile ground for conspiracy theories. We now hear people in the ‘wellness community’ accusing Bill Gates of killing children in India with polio vaccines, or blaming 5G for the spread of coronavirus. Many in the normally liberal, well-meaning wellness community are starting to voice opinions made popular by far-right authoritarians. Again, the Internet has had a pivotal role in this strange shift.

In short, opinions on the Internet are in an unprecedented state of flux, and creating unusual kinds of social instability: ‘a churning mess’, as memorably described by Carl Bergstrom, a professor at the University of Washington.

What can be done? Options for the big tech companies

We would like reliable medical advice to spread online, not snake oil; we would like true information to spread, not lies; we would like reasoned political opinions to spread, not opinions that exist only to polarise and incite. How can we move in this direction?

The way information flows on the Internet is shaped by the algorithms run by social media sites like Facebook, and Internet search sites like Google. Many options for responding to the infodemic take the form of changes or extensions to these algorithms, to adjust the sort of content that pops up on our searches and social media feeds. We’ll briefly review some of these technical options, and summarise what is already being done.

For social media companies, one option is to curate content. The most obvious, and toughest, measure is to take content down altogether. In some of the worst cases — threats of violence, serious privacy breaches (including so-called ‘revenge porn’), or of course, live-streaming of terrorist atrocities — this seems like the only adequate response. For less obviously toxic content, though, there is a range of other possible responses that stop short of outright redaction. Content that is factually incorrect or highly dubious can be flagged as such, perhaps with a message directing the reader to a fact-checking site. It can also be left in place, but not recommended to other users. (The ‘recommender systems’ that decide what items users see in their daily feeds are a key mechanism for transmitting information in social networks.) That last measure wouldn’t involve the company banning or blocking the content, but they wouldn’t be actively promoting or disseminating it either.

When assessing posts from high-profile users like Trump, these decisions are very much like the decisions editors make in traditional media: they are made by individuals at the top of the organisation, after extensive deliberations. (In this sense, Zuckerberg is very like a powerful newspaper editor from bygone days, deciding what people do and don’t see.) But for ordinary users, the role of an editor has to be played by an algorithm. To perform curation on a large scale, the company has to define classes of social media posts that require some form of curation, and then build classifier systems that can automatically identify these classes, so that the appropriate curation can happen. Such systems work through machine learning: they are trained on many examples of the defined classes, and after training, they can recognise new unseen examples. Classifiers are the core technology in modern AI — so essentially any process of mass curation on social media sites is delegated to AI systems.

The big social media companies all perform mass curation using AI classifiers. For instance, Facebook has the ability to tag posts as false, or potentially false, or partly false. It uses classifier systems to identify potentially false posts. Posts classed as potentially false can be sent to independent fact-checking partners (often traditional news companies), who use human experts to provide a firmer verdict. To improve the classifiers, they are further trained on the verdicts of these fact checkers, as well as on reports from users. This fact-checking system still has major flaws, as we’ll discuss below — but it does exist. Facebook also uses classifiers to identify and remove content from ‘terrorists’ and ‘hate organisations’. (The human annotators who create training sets for these classifiers are called ‘moderators’: again this job is outsourced to external companies, whose employees have a psychologically harrowing task.) The category of ‘hate organisations’ was added after last year’s Christchurch attacks, and prominently features white supremacist groups. (Facebook signed up to the Christchurch Call initiated by Jacinda Ardern and Francois Macron after the attacks.) Trump’s tweets play well to white supremacists, but Facebook has not yet curated any of these. However, Zuckerberg is currently under considerable pressure to curate Trump’s threats of violence to George Floyd protesters, from his own employees and from the scientists whose research he funds.

Twitter also uses AI classifiers and partner companies to detect tweets with disputed or misleading information, including covid-related information. Two weeks ago, Twitter took a dramatic step by labelling one of Trump’s tweets about the George Floyd protesters as ‘glorifying violence’. This was of course a human editorial decision: Twitter’s human curators are essentially entering the political debate. But its automated curation processes are still very far from transparent. YouTube also recently introduced policies removing videos with covid advice that contradict WHO guidelines, and videos linking covid with 5G networks. Again, these policies rely heavily on AI classifiers: classifying videos is harder than classifying text, so these policies are particularly hard to implement accurately.

Another option for social media companies is to demonetise content in certain categories, by withholding advertisements. Social media ads create a brand new mechanism for extending and amplifying bubbles of online opinion: controversial or extreme opinions often attract many viewers, so content creators are drawn financially towards these opinions. This trend is particularly visible on YouTube, where content creation can be as easy as pointing a camera at yourself and saying what’s on your mind. YouTube advertising has created a new category of professional opinion disseminators, quite distinct from traditional journalists. YouTube have a longstanding policy of withholding advertising from videos presenting ‘controversial issues’ (such as anorexia) and ‘sensitive events’ (such as wars and terrorism). Covid was until recently deemed a ‘sensitive event’, prompting a wholesale advertising ban for covid-related videos. More recently this ban has been lifted for a small (apparently hand-picked) set of providers.

For web search companies like Google, responding to the infodemic is a matter of regulating access to content, rather than curating or monetising content. Finding content on the web often involves a search, using Google or one of its competitors (as Google has over 90% of the search engine market, it’s questionable whether it really has any ‘competitors’, but a few alternatives do exist.) When a user sends a search query to Google, the key issue for Google is how to rank the many web pages that have some relevance to this query. Users rarely look beyond the first few results, so Google’s ranking algorithm essentially controls access to web content. The general principle for Google’s algorithm is to favour pages that are linked to by many other pages. But this algorithm is overridden for certain categories of query. Sometimes this involves promoting certain chosen sites. At present, for instance, the results page for covid-related searches features a prominent ‘covid-19 alert’ bar before the regular search results, with pointers to various government information sites, and links to WHO sites with further information. These sites are manually selected, but an AI system must first classify the user’s query as covid-related to bring up these responses. Sometimes the algorithm is overridden by omitting certain sites. This is mostly done to comply with the law: for instance, in Germany, a law requires Google to block a range of extremist sites from search results, and European Union law requires them to suppress search results that violate the privacy rights of EU citizens (at least for searches within the EU). Google’s general ranking algorithm probably also includes specific provisions that penalise sites from certain providers, or containing certain classes of content — but we don’t know the details of this algorithm, so it is hard to be sure. One place where Google’s editorial policies are more visible is in search query autocompletions. Google has an explicit policy to withhold autocompletions that are ‘violent’, ‘hateful’ or ‘dangerous’, among other things. The latter category appears to include covid advice that is deemed dangerous.

Social media platforms also provide search tools, for local searches within their platform. Here again, the general ranking algorithm is overridden for certain classes of content. For example, in the US and Australia, Facebook users searching for white supremacist content are redirected to sites that encourage people to step back from white supremacist beliefs. (These policies were both introduced after the Christchurch shooting, and Facebook presents them as further responses to the Christchurch Call, alongside the curation policies discussed earlier.) YouTube search results are sometimes now tagged with fact-checking labels, supplementing the more heavy-handed video blocking policies described above.

To sum up: big tech companies have a range of automated tools for curtailing the spread of false or harmful information online — and they are making increasing use of these tools as the covid crisis progresses. These tools involve a mixture of human oversight and high-tech tools. In the typical case, human editors high up in the big firms decide on certain categories of information to be censored, in one form or another. After this, human annotators in fact-checking companies, or in the general public, create datasets of web documents to train AI classifiers to recognise documents in these categories. The classifiers then learn to perform this task, with more or less accuracy depending on the subtlety of the classification task, and the form of content to be classified. These general systems are supplemented with hand-crafted editorial policies, about particular high-profile documents or authors.

While these processes are becoming somewhat more standardised, the Internet is of course still alive with a vast amount of false, harmful and downright odd information. This is partly because new categories of censorable information are constantly emerging: editors must identify each of these, and classifiers must be built and evaluated, which all takes time. It is also partly because the purveyors of dodgy information fight back, altering their content to evade the relevant classifiers. Classifiers must therefore be constantly retuned: we end up with a technological ‘arms race’ between companies and purveyors, just like the race between spammers and spam detectors in the world of email.

What should tech companies and governments do to control information pandemics?

So much for what can be done, and is being done, to combat infodemics. What should be done? For one thing, should the big tech companies be using AI classifiers at all, to moderate and censor content? We believe there is no alternative to using AI classifiers: there is simply too much content for human moderators to deal with unaided. (To take one small example, 6000 tweets are produced every second on Twitter.) Automated moderation is already very widely used, and it’s practically impossible to imagine an internet where this doesn’t happen. The harder question is what categories of content are selected for automated moderation, and how classifiers are trained to recognise these categories. In short, how should the big tech companies decide what types of content get automatically promoted, suppressed or redacted?

Obviously this is a vastly difficult question. The tension between free expression and harm prevention has never been an easy one, and the Internet has amplified it to unprecedented levels, as readily demonstrated in the current covid crisis. However, we can step back and consider what processes should inform the big tech companies’ decisions, especially in relation to their development of AI classifiers that implement censorship. What is the process for deciding on categories of content that require moderation, and on the form moderation should take in each case? What is the process for training the classifiers that identify these categories? Are these processes internal matter for the tech companies, or should governments be involved — and if so, which governments? Should user communities or other civic society groups be involved? If so, how? We’ll conclude by discussing these questions of process.

Let’s take governments first. It is very likely some collaboration between governments and big tech firms will be needed. (Last year, Facebook actually asked for government regulation on ‘harmful content’ and ‘election integrity’ — a move very likely motivated by self-interest, but nonetheless in the right direction.) The Christchurch Call initiated by Ardern and Macron was a significant landmark in developing responses for ‘terrorist and violent extremist’ content: unprecedentedly, signatories to this document included sovereign heads of state, and also CEOs of big tech companies, sitting side by side. Of course, the discussions between countries and companies will be full of disagreements. Most glaringly in the current covid crisis, the US government has just flatly withdrawn from the WHO — the very institution whose advice on covid is used by tech companies as the gold standard in their curation of information on this topic. And it hardly needs saying that giving some governments a veto over what gets shared or censored on the internet could have pretty sinister consequences.

Who else should social media and other tech giants be speaking to, then? We believe the discussion about content should also include citizens, and their representatives in a variety of groups, both local and international. We think it’s essential that governments and companies hear citizens’ voices, whether defining policies, categories of content, or in the nitty gritty of identifying instances of these categories.

Given the difficulty and importance of training classifiers to identify classes of content, we also believe big tech companies should devote far more resources to this essential process. At present, the ‘moderators’ who manually identify objectionable and dangerous content of different kinds to create training sets for the classifiers are employed by contractor companies. Very often their employment is precarious, and their work is unpleasant. There are strong calls for tech companies to perform this work in-house; given its importance, they need to value it much more than they currently do. The same goes for ‘fact checkers’, whose work is also outsourced: while these people do something like journalism, it is a hollowed-out form of journalism, that automates one component of the job of traditional journalists. It is vital for the human workers who contribute to classifier training to have fulfilling, meaningful jobs, and for their work to be valued appropriately within the tech companies.

Tech companies also need to be accountable for the automated moderation and censorship policies they implement. While they may lack the enforcement powers of governments, their power and influence over the global marketplace of ideas exceeds that of all but a few states. Google and YouTube control around 90% of their respective markets. Social media sites (mainly Facebook and Twitter) now serve not only as means of connecting with friends but, for a growing number of people, as a primary source of news, and their advertising market is thought to be worth close to $100 billion. For many of us, these companies have greater control over what we read, see and think than our governments. Public accountability is arguably as important for the big tech companies as it is for governments.

To enable accountability, there needs to be more transparency around the moderation processes implemented by tech companies. For instance, we saw how citizens can contribute to the development of classifiers for fake news and other categories, by reporting posts and other items. But these contributions can’t yet be tracked: we have no way of knowing how much attention a tech company pays to any reported item. From the perspective of accountability, it might be useful to make more information about classifier training sets publicly available — without conceding ground in the arms race with fake news purveyors, naturally. We suggest companies should perhaps also be accountable for the performance of their classifiers. How accurate is each classifier? This is a question which is readily answered technically: there are well-defined standards for evaluating AI classifiers.

A final important question is how to approach the moderation of content from individual Internet users. Traditional media has always been subject to some (admittedly variable) degree of quality control from editors, and to restrictions imposed by law. Certainly, some media outlets have questionable standards, but all but the worst of them would be reluctant to publish content that is blatantly defamatory, threatening, abusive or false. But these restrictions were imposed on content produced by journalists — a small, though influential group of people in society, for whom editorial restrictions were a part of the job. In contrast, the internet means that everyone is a content creator, and any moderation or censorship will apply to content produced by all citizens who produce content online.

This is a very different prospect, with quite different implications for freedom of speech. It may be justifiable to prevent some opinion being expressed to a mass audience — but we might hesitate to censor the same opinion when expressed online by some person to a small group of friends. Not only does the extent of harm to be avoided seem very different, so does the impact on those people. During Covid lockdowns, many found that social media took on some of the role usually played by friendly discussions in cafes, bars or family barbecues. Should that sort of exchange really be subject to the same restrictions as a newspaper column?

Of course, on social media, an opinion originally shared with just a few people can quickly be disseminated to a large group. Perhaps the level of scrutiny and degree of censorship for an online item should be a function of the size of its audience, as much as of its content. On this model, as an item becomes more popular on a social media site, the site would assume an increasing amount of editorial responsibility, and would be required to implement greater levels of fact checking and content control. Paying dynamic attention to the size of an item’s audience may offer some prospect of resolving the tricky legal question of whether a social media company is a ‘platform’ for individuals to express opinions, or a ‘publisher’ of these opinions. It could be a ‘platform’ for items with small audiences, and a ‘publisher’ for items with large audiences, or for items which it actively recommends to large numbers of users.

Ultimately, what we are asking for are new institutions that govern the flow of information online, to supplement the institutions that functioned (more or less) for traditional media. In particular, we need new institutions overseeing the development and deployment of automated classifiers that help moderate and curate content online. The classifiers already exist, and are being widely used by tech companies: what is missing are the institutions that oversee their use. It’s time for tech companies’ content moderation processes to be made more transparent, so they can be informed by a much wider and more inclusive conversation. As we are seeing right now, the stakes for society could scarcely be higher.

The covid infodemic and its control: the role of AI systems

Written by Alitrieste