Time to Challenge ‘Fake News’ with AI

It seems that not a day goes by without the problem of ‘fake news’ making headlines in the United States. If anything, the problem is growing to become an international crisis, in countries such as South Africa, India, Australia, the Philippines and especially Europe, where concern is high in both Germany and France due to their upcoming elections. Like spam, fake news is a global problem.

Something needs to be done about the fake news problem, and a group of technologists, including this author, are taking up the challenge with a competition we are calling the Fake News Challenge (FNC).

In a nutshell, the Fake News Challenge is a competition in the spirit of the Netflix Prize designed to foster the development of AI technology to help solve the fake news problem. Starting today (Feb 1st, 2017), teams are being given a concrete fake news-related task for their systems to perform, along with training data on which to build and test their solutions. At the end of the competition (in early June) teams will be given a new, never-before-seen set of test data. They will be asked to run their algorithms on this new test data, and the three teams that score highest will be awarded cash prizes. The details of the prizes will be announced at a later date.

The remainder of this post provides important highlights about the FNC itself, the objectives we are hoping to achieve, and answers to a few important questions. For full details and the latest news about the competition, as well as to sign up to compete, please visit our website: FakeNewsChallenge.org.

Fake news, defined by the New York Times as “a made-up story with an intention to deceive” [1], often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events [2].

The goal of the FNC is to explore how artificial intelligence technologies, particularly machine learning and natural language processing, might be leveraged to combat the fake news problem. We believe that these AI technologies hold enormous promise for significantly automating parts of the procedure human fact checkers use today to determine if a story is real or a hoax.

Assessing the veracity of a news story is a complex and cumbersome task, even for trained experts [3]. Fortunately, the process can be broken down into steps or stages. A helpful first step towards identifying fake news is to understanding what other news organizations are saying about the topic. We believe automating this process, called stance detection, could serve as a useful building block in an AI-assisted fact-checking pipeline. So stage #1 of the Fake News Challenge (FNC-1) focuses on the task of stance detection.

Stance detection involves estimating the relative perspective (or stance) of two pieces of text relative to a topic, claim or issue. The version of Stance Detection we have selected for FNC-1 extends the work of Ferreira & Vlachos [4]. For FNC-1 we have chosen the specific task of estimating the stance of a body text from a news article relative to a headline, which may come from the same news article or a different news article than the headline. The body text may discuss the same topic as the headline or an entirely different topic. More specifically, the body text may agree, disagree, discuss or be unrelated to the headline.

Please refer to the Fake News Challenge website for a full explanation of the FNC-1 stance detection task, scoring criteria, other rules and the details of the dataset we’re providing to teams. Here is a brief example of what we are asking teams to do:

Suppose the team is given the headline:

  • “Robert Plant Ripped up $800M Led Zeppelin Reunion Contract”

If that headline is paired in the training (or testing) set with a body text containing somewhere within it the snippet:

  • “… Led Zeppelin’s Robert Plant turned down a £500 MILLION to reform his supergroup. …”

then the correct label for this particular [headline : body text] pair would be Agrees, since the idea expressed in the headline aligns with the ideas expressed in the body text. Obviously an exact match is not required, and almost never happens, as illustrated by this example.

In contrast, if the body text paired with the headline above contains the snippet:

  • “… No, Robert Plant did not rip up an $800 million deal to get Led Zeppelin back together. …”

then the correct label the team’s system should produce for this example is Disagrees. Alternatively, if the body text simply talks about the same claim as the headline, without taking a stance about its veracity, as in:

  • “… Robert Plant reportedly tore up an $800 million Led Zeppelin reunion deal. …”

then the correct label would be Discusses, for obvious reasons. Finally, if the body text is concerned with an entirely separate topic than the headline, as in:

  • “… Richard Branson’s Virgin Galactic is set to launch SpaceShipTwo today. …

Then the correct label for this [headline : body text] pair would be Unrelated, again for self-evident reasons.

In summary — the stance detection task for FNC-1 involves determining how the perspective (or stance) of a body text relates to a headline. To make it interesting, the headline and body text will often (but not always) come from different news articles.

You might be asking yourself , “How this stance detection task relates to fighting fake news?” We’ve got a FAQ for that! But in short, the answer is twofold:

  1. Stance detection has been suggested to us as a useful tool to assist human fact checkers in doing their job, by expert human fact checkers themselves. Since our goal is to help those beleaguered heroes, we’ve chosen to focus FNC-1 on a task that we think has the potential to help fact checkers who are “in the trenches” today.
  2. We strongly believe solutions to the stance detection task can and will form important building blocks in the quest to develop an automated fact checking pipeline. See the FAQ for more details on how this might be accomplished.

We aren’t expecting to solve the fake news problem with technology alone. In fact, the fake news problem is unlikely to ever be solved completely. Machine learning methods have virtually defeated spam emails. Similarly, we are convinced that the combination of expert human fact checkers, bolstered by the AI technology we hope to foster through our competition, can make a difference and keep fake news in check.

Please join us and the 72 teams who have already signed up to take the Fake News Challenge and help tackle this critical problem.