The War on Facts

Gideon Mann
Aug 8, 2017 · 19 min read

These are written notes to accompany an invited talk at https://sites.google.com/site/nlpandcss/nlp-css-at-acl-2017

Before I start let me say thank you for letting me speak with you today. I’m really excited to be talking about this topics, but I should also say that the views expressed in this talk are my private opinions not those of Bloomberg the company.

Most people know about Bloomberg as a news organization — and for the purpose of this talk that’s true. But I should say that the business of Bloomberg is not news, but the software product that we produce — the Bloomberg Terminal. The Bloomberg Terminal is what you’d have on your desk if you were on Wall Street to give you the data, analytic tools, and access to the community that you need for job. In part, this is why the news part of Bloomberg has a single-minded focus on accuracy. People rely on us for the truth because they’re going to be putting a lot of money down on information and analysis they are getting from us. This focus makes the epistomological nihilism, the erosion of truth, such a danger to our organization.

In this talk, I’m going to discuss about how I see the changing threats to our factual world, predominantly from fake news and fake voices, and then focus on some of the responses that most encourage me. This is like a cover album of a talk and also a talk that focuses more on problems than solutions. And my hope is that by sharing this work with you, you’ll have a sense of the state of play and be encouraged to also respond.

The Changing Role of Journalism

Journalism has changed a lot over the past two decades. In 1997, the morning paper delivered an authoritative perspective on local and world events. Journalists at reputable news organizations rigorously fact checked stories by waiting to publish unless they could confirm something from two or more independent sources and giving subjects the opportunity to correct the claims the article was going to make. Certainly this is still how a lot of news organizations, and it’s how we operate.

Of course, before you make a phone call, you want to know who you’re going to call. One thing that comes out of came out is that source identification and determining credibility is a crucial step in assess whether to believe a piece of information or not. In some sense, you can imagine this as pre-fact checking. This is something Bloomberg has always done for news sources. As we aggregate 125,000 sources of news, we go through a strict process to vet these sources.

Increasingly, in addition to traditional news wires, we also incorporate relevant tweets. traditional news wires, we have a manual process that struggles to scale to social media, and go through to manually review twitter sources for legitimacy. We’ll can then additionally screen particular tweets in before we would explicit promote them.

This focus on Twitter in addition to traditional news wires, echos the change that’s happening in the industry overall. Social media platforms enable citizen journalists to create disseminate news stories — without the journalist training, supervision, or standards of traditional journalistic practice.

At the same time, Facebook and Google are becoming the predominant source for ad dollars, driving all growth in media away from traditional sources ( see https://www.baekdal.com/blog/what-killed-the-newspapers-google-or-facebook-or/ and http://www.businessinsider.com/google-and-facebook-dominate-digital-advertising-2016-12?r=UK&IR=T )

Part of this growth is fueled by a feedback process that explicitly optimizes for engagement. Machine learning as a mechanism for filtering and directing traffic is amazingly effective. Perhaps its not immoral but almost by definition amoral.

Another change in the dynamic is the blurred lines between journalists private lives and their social lives. As Mark Dredze notes:

“While everyone has now a presence both in the real world and online, reporters increasingly cultivate an online following that is significant. Several political journalists have well over a million followers (eg. @andersoncooper, @jaketapper), and some journalists eclipse their own news organization. As one example, @ezraklein has three times as many flowers are @voxdotcom.”

In this talk, however, I’m going to focus on Twitter as a mechanism for news. In part, because it’s been easy to study since it is more open than Facebook and Google, in part because it’s a locus for emerging news, and finally, in part because I am a twitter addict.

Fake News

This list shows the top fake news items of 2016. It’s somewhat spectacular that a story like “Pope Francis shocks the world endorses Donald Trump for President, Releases Statement” got 916,000 shares across Facebook. But unfortunately stories like these aren’t just a cruel joke, they often have consequences.

When money is at stake, there are always dishonest actors who will try to squeeze out money here and there. I don’t know how many of you remembered the bombing of the white house a few years back that the AP reported in a tweet. Hopefully none of you, because it didn’t happen. The AP got hacked and this tweet was sent out, causing a rapid crash in the stock market wiping out $136b in revenue over the course of a few minutes. When people realized it was a fake, the market quickly recovered. Last year there was another significant piece of faked content — a faked press release from Vinci that caused a 20% drop in stock price for that company.

Of course, these are just financial consequences for fake news. More grim are the consequences that came out of PizzaGate — the online conspiracy that there was a child sex ring orchestrated by Hillary Clinton out of the basement a pizza parlor in D.C., a ludicrous story made desperately poignant by the fact that the pizza parlor which didn’t even have a basement. However absurd it was, Edgar Welch believed it enough to drive from North Carolina to Comet Ping Pong with his AK15, discharge his weapon into a closet, and then go promptly to jail for four and half years.

What is truly astounding about these stories is that they have only the most fragile pieces of evidence to back them up. “Pope Francis endorses Donald Trump” was held up by only words. What is more troubling is the emergence of ways to generate synthetic events that are increasingly difficult to distinguish from the real thing.

This is an synthesized audio clip and it’s clearly not as convincing as the real thing. You can tell that there’s something off with the voices.

If LyreBird is showing audio synthesis, this video shows the capabilities for video reconstruction and see also this paper.

The video quality isn’t great, and it’s easy to distinguish that what the picture shows isn’t a real video. However, as we all know, every year, we collect more data and can build more sophisticated models, and the quality should get better and better. It’s not at all clear how long it will take until this and the audio synthesis models will be able to exceed human performance, but it seems likely that it will happen.

In this video you can barely see Mitt Romney’s face, certainly not clearly. There is a significant amount of static noise in the background. Right now, when this video came out, there was little dispute that it was real, and this is the low bar that synthetic video and audio has to overcome. According to a senior IARPA that I was a panelist with, this technology has been observed in the wild creating fake video for the purpose of driving a political outcome.

So, the picture for fake news is pretty bad. People believe written stories with no backing evidence, perhaps just because they like what it says. What is going to happen when fully synthetic content is available, and it’s not just chunky reproductions.

Perhaps even more perniciously, Jun et al 2017 perform a set of experiments that suggest that on social media, as opposed to in more individual contexts, people are less likely to perform fact checking.

Fake Voices

So far we’ve been talking about fake news, but just as insidious as fake news are fake voices. I’m going to present some results and studies around Twitter, but want to be clear that the problem isn’t confined to Twitter — Twitter is just more transparent and accessible than Facebook. As a recent example, during the French election, Reuters reported that Russia attempted to spy on the Macron campaign via Facebook and the article reported more than 70,000 suspended accounts related to propaganda and spam.

But focusing on Twitter, Twitter reports slightly more than 300M user accounts, but how many are actually real people? In 2014, Twitter reported a fake account rate of 5% to 8.5% — which would correspond to 15M to 25M fake accounts. More recently, Varol et al. 2017 built a machine learning model based on hand label fake accounts and fake accounts attracted to a honeypot, and estimated by 9.5% to 15% which is 28.5M to 45M fake accounts.

In pursuit of automated bot accounts, Echeverría and Zhou uncovered a massive bot net that tweeted quotes exclusively from StarWars that shared some particular properties. They were all created in 2013 and had geolocated tweets that placed them uniformly at random inside geographical regions centered in the US ad western Europe — not distributed according to population densities. They tweeted infrequently and had few friends and followers.

Presumably, these bots weren’t created by someone simply trying to get the word out about StarWars. And if you look at this account you can also see that their tweeting behavior also includes retweeting dubious links — here one for online payday lending. Another possible use for these bots is to serve as paid followers, to fluff up someone’s statistics.

Work by Freitas et al in 2014 created fake twitter bots and measured how successfully they are able to evade suspension and build followers. They built 120 bots with various biographic specifications, varied number of tweets / retweets, and were in many cases able to build significant follower groups. Surprisingly they showed that most of their bots were able to evade detection and reporting.

Beyond direct advertising, another reason to create these fake social accounts is for use in click-fraud. Click-fraud is the use of automated bots to generate real traffic and drive revenue. It works like this. An unscrupulous publisher creates a site and gets ads placed on it’s site. It then pays bots to click on ads that are located on that site, and collects money from the advertisers.

As the industry matures, ad networks have taken increasing measures to protect against these fraudulent clicks, and one way they protect against fake clicks is by paying more for clicks coming from logged-in users, that is users with social media accounts. These fake twitter accounts used in the context increase the value of these clicks.

Twitter is not a safe place for dialogue. The ability of anonymous actors to attack without consequences means that hatred can often drive out particular voices. The alt-right has been notorious at doing this — as one example was the targeted attack on Leslie Jones, after her work in Ghostbusters, which led her to leave the platform. But these attacks also sometimes from the left, targeting people who are caught on camera being racist in public, and sometimes harrassed to the point of losing their jobs. A good overview of this is in Markwick and Lewis 2017.

These attacks are typically targeted against particular individuals and coordinated by humans. This is contrast to programatic intervention by significant numbers of bots or at large scale. When bots are used to coordinate messaging, the effect can be more subtle and diffuse, though perhaps just as damaging.

The term “astroturf” has been used to describe the process in which the appearance of a grassroots movement is created, without the underlying support. When bots are employed they can be used to create artificial trends, or Twitter bombs, that alter can search ranking results or manipulate the trending topics categories. When used in concert with fake followers, they presumably could be used to give fake likes or retweets to candidates. The sheer number of fake political followers raises the question of how to evaluate the number of likes or retweets a tweet has received.

While media can’t change peoples minds directly, scholarship indicates that one of the primary roles in media is functioning to focus attention that is to set the agenda of what, and often how as well, topics are covered. When bots are used, they can augment a small minority voice and make it appear significantly larger. As a consequence, an event or viewpoint that might otherwise be ignored can get media attention and coverage. It’s certainly easy to determine that one particular account is a bot, it’s much more difficult to detect that attention for a particular tweet or thread is getting juiced by an army of fake voices.

Perhaps not surprisingly, at this point the bot eco-system is complicated and there are many actors. @drflab had a very nice analysis of how bots interacted with one particular piece of content — an article by Charles Blow entitled “Trump is his own worst enemy”.

After this article came out the first set of bots that responded were newsbots that simply retweeted the headline — they presumably are set up to retweet new york times headline, possibly as well as other new sources as well. Next a bot net that predominantly retweets anti-trump messages got activated they caused a spike in retweet activity around this article. Subsequently, a group of bots that partially send out custom messages and partially retweet traditional sources got into action after a high profile trump defender tweeted to respond to the tweet. Finally, a group of bots that promote particular products and also climb onto trending topics got into the game retweeting follow-up messages.

Of course, one use of astroturfing is for promotional reasons. Like spam, social media can take either automated messaging or so-called “organic” messaging to promote products. Recent work by Clark et al, show that Tobacco industry has been a heavy user of Twitter to promote e-cigarettes, and that their messaging overwhelms the organic chatter around e-cigarettes. Clearly, this effect significant for research purposes, but it’s effect on general population is also likely to be significant, perhaps even more than traditional advertising since it may be difficult to distinguish the bots from the authentic voices.

It has been well known that China has engaged as a state sponsor of social media, but it hasn’t been entirely clear how they engage or what they say. Typically, people suspect that the state explicitly pays people 50 cents (renminbi) to people to attack foreign countries and or argue for the states’s position on social media. Recent work by Gary King and his group out of Harvard has suggested that the truth of situation may be different.

Using a leaked archive of emails from 2015 from the Internet Propaganda Office of Zhanggong, the team finds that more than 480 million posts per year are created by the state, that the likely posters are government officials, and that the purpose of these articles is not to defend the state’s position. Rather, the goal of the posters appears to be to cheerlead and thereby attract attention away from unpleasant or undesirable topics.

“Early May Burst: 3,500 posts about a variety of topics, such as mass line, two meetings, people’s livelihood, and good governance. Immediately followed the Urumqi railway explosion”

For all of this work, the literature still seems inclusive about the effects of bots. Though some work, such as Abokhodair et al 2015 is able to generate retweets and attention for their bot network, other work like Murty et al. 2016 shows relatively little effect from their botnet. Beyond that, it seems like the field is getting better about detecting bots and detecting bot activity.

One reason, perhaps, that this research hasn’t gotten as far as it could is that the experiments are expensive to conduct. They require development of a large software apparatus to conduct these experiments and a constant race against Twitter that would turn them off. While commercial entities can spend effort to do it, the research community struggles more.

In sum, the use of algorithmic bots on Twitter is fairly vast, both from non-state and state actors. In many cases, it’s not even apparent to people that it’s happening, in others, it’s so overwhelming that it’s hard to ignore. And right now, bots are fairly straight-forward and easy to detect.

Worryingly, as conversational agents get better, it’s going to be increasingly difficult for humans to detect.

Research Directions

Ok, so there is a lot of bad news. And some of it is truly scary. But I think what is encouraging , and one of the reasons I was so excited to come here to talk, is that this community has a chance to really make a difference. There is a lot the community is already doing, and I want to point out some examples that make me really encouraged. And then I’ll point out a few directions of work that feel very important to me that I think there could be more work in. I’m sure if I’ve forgotten some pieces of work here, you will also be kind enough to follow-up with references to me later.

There are a bunch of things that the field is working through, especially in the area of detecting fake news text.

One of the encouraging projects in this area has been the Fake News Challenge. It looks at the problem of stance detection — trying to figure out whether two pieces of text, a headline and body text.

This challenge addreses the issue of click-bait — when a news headline is at odds with the story that supports it. In general, the best systems performed at 80% on the task. But clearly, this is just one aspect of fake news.

fakenewschallenge.org : Stance Detection

Input

–A headline and a body text — either from the same news article or from two different articles.

Output

–Classify the stance of the body text relative to the claim made in the headline into one of four categories:

Agrees: The body text agrees with the headline.

Disagrees: The body text disagrees with the headline.

Discusses: The body text discuss the same topic as the headline, but does not take a position

Unrelated: The body text discusses a different topic than the headline

This problem of bots has been known for a while and certainly there has been a ton of work on bot detection. In 2015 DARPA sponsored a four week challenge to detect bots on Twitter. Six teams competed and in general they achieved significantly positive results, able to detect bots using semi-automated means after a few days. No team was able to deploy a fully automated system.

Tried to detect bots that were spreading influence on a particular topic

–Tweet Syntax

–Tweet Symantics

–Temporal Behavior features

–User Profile features

–Network features

All systems able to find existing bots, with few wrong guesses over a short period of time

Most of this talk has been concerned with Twitter, but of course the effect isn’t confined there. Facebook is a significant source of news and a major way that people are misinformed. There the mechanism for introduction of fake news is in two ways — first by injection of fake news through paid advertisements and then by sharing. Facebook sought to address fake news by allowing users to report fake news. These stories are then addressed by fact checking organizations, and then are permitted to be shared, but marked as fake news.

According to recent reports in the Gaurdian however these interventions are not working. It is not clear how many fake news stories ever get the disputed tag, how much the tag reduces sharing behavior, and how long it takes for the disputed tag to be added to a news story. Unfortunately, Facebook is such a closed system that answering any of these important scientific and societal questions isn’t possible to be done by a neutral third party.

While the approach that Facebook took doesn’t seem to be working, there have been more successful effects seen from work coming from Kevin Munger out of NYU. He has two pieces of work which explore the same idea: explicitly sanctioning (scolding) tweeters who tweet objectionable things.

It’s a pretty simple idea that he explored in two directions, first in racist speech, looking at tweets that used a racial slur and a second in incivil political discourse. In both set-ups, he created fake accounts (“bots”) that responded to these slurs by giving a mild scolding. For racist speech, the sanction was

“@[subject] Hey man, just remember that there are real people who are hurt when you harass them with that kind of language”

and in the case of political speech he applied two sanctions:

“@[subject] You shouldn’t use language like that. [Republicans/Democrats] need to remember that our opponents are real people, with real feelings.”

and

“@[subject] You shouldn’t use language like that. [Republicans/Democrats] need to behave according to the proper rules of political civility.”

These basic interventions altered properties of the bots in the racist case and the people he responded to in the case of political speech and was able to decrease incivility of those affects during a month-long observation. For racist speech, he further found that social alignment increased effectiveness of the intervention and in the case of political speech found that Republicans were more affected by sanctions than democrats.

No in these cases, these weren’t actual true bots, but bots that he directly controlled, but they only issued a simple sanction, nothing more complicated. But honestly, for me the more important effect was that these worked at all.

The radicalization of people often happens online, often by watching videos. A coalition led by Jigsaw at Google has been working on a method to decrease radicalization of potential ISIS recruits : at redirectmethod.org Their strategy is to target people watching videos with advertisements to videos that promote de-radicalization. In order to figure out which videos to promote, they talked with former ISIS members and tried to understand the specific viewpoints that were most convincing. Instead of creating new content, they sourced content that already existed but that made the specific points they were trying to promote. Overall, the videos reached 300k people over 8 weeks who watched 500h of video.

In a sense, you could imagine that what they are doing is not that different from bots — bringing attention algorithmically to a view point they want to gain prominence.

Earlier, I talked about assigning credibility to traditional news sources and the infeasibility of extending this process to all citizen journalists on Twitter. One complication above the issues highlighted earlier with the Darpa Bot Challenge is that it isn’t as simple as a source being a bot or not. To really be confident in a particular source, you also have to know the domain of topics over which a source is credible.

As an example, oil refinery fires can be incredibly significant financial events, disrupting oil supply chains and causing material impact. Often the first person to notice the fire is a person driving by, who notices a fire on a building, snaps a picture and then tweets about it. These cases of citizen journalism are infrequent but when they happen incredibly significant. Another notable example is the person who noticed the black helicopters and firefight at Osama Bin-laden’s house and tweeted about it.

Part of assessing credibility for these citizen journalists is knowing exactly where the person actually is. Are they legitimately passing by the location of an oil refinery fire, or could they merely be reusing a picture that has long since existed?

Geolocation of twitter users is then a crucial part of this puzzle. Clearly lots of work has been on this, including much presented at ACL. There was a paper last year done at Bloomberg by Mark Dredze, Anju Kambadur, and Miles Osborne 2016 that picked at a piece of this puzzle. In their paper, they show that dynamically updating the imputed geolocation for a particular person, as opposed to sticking with a fixed location per person, yields improved accuracy. Perhaps not a surprising result, but when it comes to pre-fact checking, a important next step.

So far, I’ve talked about what’s already been done, predominantly by people outside of Bloomberg, and outside of me. I want to talk about work that I’ve haven’t seen. Maybe it’s happening, and again I’m sure you’ll point me to it if it’s not, but I haven’t seen it.

The first is around synthetic content. I think it’s become increasingly urgent to figure out ways to detect synthetic audio and video. I don’t think it’s going to be long until we have a significant scandal where its alleged that the content was synthetic and faked. And then soon after that, it’s going to become a pervasive response. In the short term, forensic analysis is likely to be a crucial stop-gap, especially before synthetic methods are perfected. I imagine that this kind of work might come out of the machine learning community, just as the synthesis work does.

But in the long term, we must be working on trust methods that create trust from the ground up just as content is created. It’s not entirely clear what this trust would look like, but it should be tamper-proof and protected from state agents. It feels very much like a crypto and security project to me, though perhaps it also involves some watermarking in-content information as well.

The work by Kevin Munger I have to say leaves me very inspired. Not his smaller point around the fact that social identify guides the acceptability of a social censor, but merely the fact that more or less automated intervention on social media can have a pro-social effect.

It’s a tricky question, because it’s not at all clear how heavy handed this kind of intervention should be. Creating a nanny state of fun-hating scolds doesn’t seem like a good outcome. But I could also image that there would be a pro-social intervention that would be a middle ground between censoring and total laissez-faire. In some way, this kind of intervention could create a social cost to incivility where there is little of one right now.

Another major question that I’m trying to follow is precisely how does social media shape attention. Are there certain kinds of messages that are more or less effective at shifting focus? Are there particular frames that are easier or harder to make salient? Are there particular strategies that affect how quickly attention is focused?

One of the things that machine learning is especially good at is giving people what they want. In the context of news, this has shifted from gatekeepers who set normative ideas of what people should want, and provide it, to a system where models iteratively model human behavior and increasingly aim to satisfy human hungers.

It’s not clear that these hungers are healthy. Tristan Harris has been at the forefront of thinking about what ethical principals we should adhere to when we create new software, by thinking carefully about whether the software can lead to unhealthy patterns of behavior. He predominantly talks about this how technology attracts our attention for example through notifications.

But, these same questions should be raised about the way that we get our information. I believe that fake news attracts attention because it somehow provides something that real news doesn’t. Perhaps this comes from an emotional payoff, or a sense of validation, or relief from boredom. But I think it would be worthwhile to understand carefully if there are qualities that fake news provides that conventional news doesn’t.

One of the additional consequences I worry about is the encroaching distrust of all facts, of even the existence of facts. The idea that who knows what is even true, so why try. I suspect this is true, and also only going to incerase with the creep of synthetic content, with increasingly unhealthy information diets, and with the disappearance of gatekeeper media that constructs a shared consensus reality.

Frankly, I have no idea how to approach this one, but I think it’s increasingly an urgent enterprise.

I’m hoping to moderate a panel at SXSW 2018 that delves deeper into this topic. I’ve assembled a panel of NYC Media Labs’ Justin Hendrix, Wikimedia Foundation’s Katherine Maher and Nicholas Diakopoulas, a professor at Northwestern’s School of Communication to discuss how technology has changed how people interact with information, the threats to truth including fake news, synthetic video and voices and disinformation, and — most importantly — what needs to happen in the near and long term to improve the situation and defend democracy. Check it out and vote in the PanelPicker before August 25 — http://panelpicker.sxsw.com/vote/70358

I wanted to specifically give thank you for their help with this to: Mark Dredze, Clay Elzroth, Kang Sun, Guido Zarella, Anju Kambadur, Miles Osborne, Justin Hendrix, Chris Wiggins, Arnaud Sahuguet.

    Gideon Mann
    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade