Introducing Factmata — Artificial intelligence for automated fact-checking
Dhruv Ghulati, Co-Founder of Factmata
Last week, the Factmata Project won official backing from Google and its Digital News Initiative. We are extremely proud of this win, as it is a reward for our many years of research into the Natural Language Processing (NLP) problem of statistical fact checking. We are also happy that major tech companies like Google are taking interest in working on the problem of automated fact checking, given all the media content they distribute and channel to us every day. More details here:
Google has given €150,000 to three UK organisations working on fact-checking projects to help journalists and the…www.theguardian.com
Over the course of the next few months, we will be launching a prototype of the research already completed in statistical fact checking and claim detection. So far, our work has been in identifying claims in text by the named entities they contain, what economic statistics those claims are about, and verifying if they are “fact-checkable”. At the moment, we can only check claims that can be validated by known statistical databases — we built our system on Freebase (a fact database that came out of Wikipedia’s knowledge graph), and will be migrating it to new databases such as EUROSTAT and the World Bank Databank.
For example, we can, to a degree of accuracy, correctly identify and fact check claims like “The UK is not a prime suspect; greenhouse gas emissions have fallen by 6.2% this year alone”, “The United States cares about its energy sector— this administration has invested $4.5bn this year alone”, or “Our aid work in Somalia is paying dividends — only 0.2% of the population is severely malnourished”. As we know, these types of statistics are often misused or misquoted by politicians in order to promote a particular worldview. We want to surface our research (and improve it through the course of the year) as a tool for anyone to check facts on any piece of textual content they come across on the web (articles, speech transcripts, quotes etc).
In future posts, we will write in detail about the artificial intelligence (AI) challenges of computational fact checking, the history of statistical fact checking, the approaches others have taken and why we are different, the technical methodologies of the AI research we are working on, and share our designs for our first prototype to get your feedback. It is important to us that what we build is user-driven, from the ground up. Like any great product, we want to build something that people love!
Who are we?
We are a group of developers and researchers, with collectively more than 20 years of published research in the fields of NLP and Machine Learning. Most importantly, we all have a passion for holding politicians accountable for what they say and promise.
Collectively, our uniqueness in this field has been that we have published pioneering work in the field of relation extraction where there is no labelled text. This is the ability to teach machines to understand that a sentence containing “the percentage of people out of work” is talking about the unemployment rate. Our Scientific Advisor, Sebastian Riedel, has developed better ways to noisily label training text using distant supervision, and to learn from unlabelled data. For the task of fact checking media content, where this content is continuously growing at an alarming pace, it is crucial that any algorithm can scale without needing training labels of claims and corresponding fact checks.
In addition, we have a razor focus on specifically working on the algorithmic challenges of statistical fact checking, numerical relation extraction, and claim detection. Andreas Vlachos, Factmata’s Chief Research Scientist, has published work in statistical fact checking and stance detection. He has also pioneered the use of imitation learning, for translating natural language into machine-interpretable meaning representations. This is crucial for claim inference within automated fact checking. He is in the process of launching the University of Sheffield’s dedicated Fact Checking Lab, one of the only artificial intelligence research labs dedicated to fact checking. Full Fact, a UK based organisation, produced a report on the state of automated fact checking this year, which mentioned Andreas and Sebastian’s first automated fact checking algorithm, the Simple Numerical Fact Checker, as one of the known open source tools out there.
My interest in fact-checking came from my work at import.io and its quest for allowing anyone to build APIs to any website to freely access its data and use it for the common good. I since built upon this passion to think about automated fact checking and promise tracking for several years, leading to my masters thesis in statistical claim detection using cost-sensitive classification and distant supervision at UCL.
Automated Fact Checking
100% automated fact checking is a long way away. There are a whole bunch of issues, from the need to generalise to the many possible paraphrases of a claim, to acquiring and verifying sources of truth in real time before they get outdated. Our team a few years ago published a paper on the NLP task, its challenges, baseline approaches, and discussed what can and cannot be done. We will be writing about these challenges in future posts.
But working towards it is very important. Caused by the proliferation of the internet, and the ease of dissemination of information, writers and academics have long been talking about a “post-fact” world or “post-truth era”. This is a world where claims or factual statements about entities, events, statistics, policies and many more are factually incorrect, and dangerously perpetrate through media sources in an alarming rate.
Arguably, the problems these statements cause is far beyond being intellectually questionable. Politicians often state factually incorrect information in order to promote a particular agenda, propaganda or viewpoint. Donald Trump, in his election campaign, spouted huge statistical lies — for example, claiming that 81 percent of murdered white people are killed by black people, when in fact 84 percent of murdered white people are murdered by other white people. Arguably, these types of lies may have indirectly or directly led to one of the most polarizing political campaigns and upsets in history, and could totally transform the socio-economic status quo of the US. As Tim Harford, and Paul Dughi have similarly written:
“Social media has swallowed the news — threatening the funding of public-interest reporting and ushering in an era when everyone has their own facts.”
As Matt Johnson writes in Quartz, “digital media platforms…revenue models incentivize clicks over truth”. As he says:
The more outrageous the statement, the more clicks it generates …the price we pay for a profit-driven media marketplace, it seems, is national ignorance. Convenient untruths benefit their producers, no matter which side consumes or leverages them for fundraising. Everyone in the political information industry profits from the resulting suspicion, cynicism, and outrage.
There are many who feel that automated fact checking is a pointless endeavour. Observers like Anne Applebaum and Katherine Viner at the Washington Post and the Guardian lament that real objective facts do not matter in a “post-fact” world, because people are more likely to believe in “facts” that confirm their preexisting opinions and feel right from personal experience, and to dismiss those that don’t. People against the fact-checking effort say that people don’t even care about the truth: even if they read an article that provably contained nonsense, they would still believe it due to emotional and political bias. Alex Parsons wrote a good piece on this for an alternative viewpoint.
Why is it important?
Digital journalism is an incredibly important channel for the world to gain access to information — any article worth any merit is instantly shared across social media channels and often summarised or misquoted on Twitter. People learn new things, especially facts, from reading articles and content, and seldom from “factual” sources such as statistical reports or global economic statistics. When the facts of these sources are wrong, this means that people’s opinions and views on the world are also based on imperfect information.
Some examples of this would be articles that claim that Muslims comprise 25% of the Belgian population, when they were only 5%, or that the number of illegal immigrants could be 30 million when they were only 11 million. Throughout the course of his campaign, Donald Trump claimed 7 different unemployment rates, as high as 42% when the official reported one was around 4.9%.
A world where false claims are continuously made and seen as gospel has negative implications and pass-on effects for how democracy functions. It creates factions based on incorrect data that thrive on the internet. The biggest problem this can have is an misinformed electorate, that can descend into bigotry, racism and intolerance.
Why should fact checking be automated?
There are a few main reasons why we feel that the existing status quo of fact checking needs to be disrupted and innovated using technology, and AI can solve some of these needs.
Human fact checking is onerous, costs money, and liable to error. Currently, fact checking is done manually, taking each claim as it comes, using the expertise of the human fact-checker to know where to look. Fact checking websites like PolitiFact, Full Fact, and FactCheck.org have existed for decades, and employ paid volunteers to scour articles and political speeches after they have been written, and fact check them with lengthy explanations. Most summarize their findings using a “liar’ scale. This comes at a large financial and time cost.
Even as fact-checking sites have proliferated, jobs for internal fact-checkers and copy editors have largely been eliminated, leaving reporters and editors on their own to guarantee the accuracy of their own content. Sometimes, it is difficult for journalists to have the time or resources to fact check every claim they make in an article, and resort to approximations or guesses. Fact checkers often have routines which are “highly individualized and idiosyncratic”, which leads to different standards of accuracy.
Human fact checking isn’t fast enough for the internet. Not only is manual human fact checking a burden on human resources and painful; it is actually inefficient at stemming the tide of global misinformation due to the pace of the internet. Human fact checking just cannot keep up. Researchers at the University of Warwick and the University of Indiana found it takes more than 12 hours for a false claim to be debunked online, on average. By the time fact-checkers have toiled to sift through and analyze a statistical claim, the debunking usually gets less visibility than the fake. In a famous example from the rumor detection site Emergent.info, a totally fake article got shared 60,000 times, while its debunking less than 2,000 times.
Human fact checking naturally tends to interpretation and analysis. When a human debunks, there is an incentive to be politically correct and dilute an argument with a subjectivist caveat like “however, opponents dispute this”. A Pew Research study showed that 59% of US adults prefer using facts to either verify a piece of information or correct a piece of misinformation — rather than as analysis or commentary. Any computational system will also include its sources, suggestions, and caveats, as well as expose the reasoning of any fact-check to a user, rather than claiming to be 100% accurate. However, because an AI system used by non-journalistic organisations is less bound by political correctness and the need to generate analysis and viewpoints, it has more freedom in objectively fact checking things rather than explaining things.
Human fact checks are tedious, and not easily digestible. Human fact check explanations are naturally long, argumentative, prose pieces. This is because some debunks do need explanations. However, they also mean that the reader has to switch context and read another meta-piece on the original article. People lack time to even read the original articles let along any sub-content and comments. Many admit that reading fact checks is dull, and popular only amongst well-read political boffins rather than ordinary, less intellectually minded readers. By only focusing on certain claims which are quickly fact checked as true or false, we can try to avoid this problem. Factmata aims to build a tool that is quick and instant.
Fact checks by media organisations, social media platforms, fact checking groups, or affiliated press will have natural bias. When fact checking is done within the walls of organisations, there will always be claims of bias, partisanship or untrustworthiness. If there is to be an automated tool, the reasoning of the AI must be visible to all parties, and not offered to the world by one institution with no external information as to how it works. This is why we plan to open source our code for everyone to work on and improve on, and are an independent organisation with no affiliations. An algorithm that is openly available for all to see should have no claims of maliciousness.
Fact checking and driving attention to it has benefits regardless. Regardless of the limitations of existing approaches to fact checking, we need to march forward and continue research in this field. As long as progress is made, however slowly, the work itself helps. As Alex Parson in his piece writes:
The underlying idea is that bad information, whether maliciously or innocently entered into the debate, can be corrected with good information. In an active and vigorous political culture, lies will be punished and truth will rise to the top. In the political marketplace the voters are savvy shoppers.
As Jane Elizabeth describes in another brilliant post, actually working on the task itself is valuable because fact checking makes people smarter and educates people. An American Press Insitute-sponsored study showed that reading fact-checking articles increased knowledge on a subject by 11%— and even more among people who already have a high level of political knowledge. Secondly, fact checking is growing in readership and interest which gives the right impetus to launch a product on it. The American Press Institute also showed that the number of fact-check stories in the U.S. news media increased by more than 300% from 2008 to 2012.
Factmata — focusing on Statistical Fact Checking is tractable
We don’t claim Factmata will automate all fact checking for good. Computational fact-checking is a very difficult task to solve within natural language processing, veering in the same field as generalised AI. Conroy found that humans are only currently 4% better than chance at detecting lies in text.
Firstly, one has to have a ready set of real time facts to check against, which match the exact nature and content of the claim in question. Secondly, claims can contain so many meta-claims, and it is difficult to specify what each one of them are about. Finally, many claims cannot be checked by a binary “yes” or “no” answer immediately. As this Guardian article shows, some claims will always require detailed analysis to debunk due to their inherent convoluted nature; some supposedly erroneous claims could be seen as true in certain circumstances or using pre- selected data. There are even some claims which even professional fact checkers find impossible to check, as in these examples from Google Research. Tim Harford describes another issue, that “truth is usually a lot more complicated than statistical bullshit”.
However, what Factmata will focus on for our first prototype is statistical claims. That is, sentences which necessarily contain an entity (a country, person, a place of interest), a statistical property (population, unemployment rate, height, average daily visitors), and a numerical value for that property, at a specific time in the past. For example, the claims we want to fact check might be “The number of people out of work in this country stands at 93 million (unemployed workers)”, or “90% of all products brought into our country are Chinese (China as a % of imports)”. These types of claims are fact-checkable from databases such as the OECD or World Bank, and are often used to support political statements.
Our aim for Factmata is to realise our research into a product that everyone can use to automatically check certain claims in media articles to lessen the burden of manual fact checking, and dramatically reduce its cost. We want to change the thinking that the facts presented in news, or statements made by pundits and politicians, should be taken for granted. Full automation isn’t our main goal; Factmata’s goal is to connect the claims made in digital and social media to the accepted ground sources of truth, and prevent having to trawl through explanations, links and Google searches to find the answers, which costs precious time and effort people do not have these days. Assisted fact-checking is an intermediate step to more automation.
We want to make the fact-checking experience fun, engaging and powerful, and build an environment where readers love identifying false claims, correcting news articles, and ensuring more accurate content for the rest of the world. To do this, UX and behavioural design is crucial. Tools like Politifact or Full Fact are naturally more likely to be used by political nerds who love debunking and analysing news, and are typically well read. At Factmata, we want to make fact checking an article an ordinary behaviour. That means building a hook-based user experience that is rewarding, addictive and fun. If every person reading an article spent 5–10 seconds checking that articles claims in addition to reading it, and did that every time, we would have succeeded.
Finally, we need to be as seamlessly integrated into the reading experience as possible. You should never have to switch context to do a fact check with Factmata, or examine any painful meta-analysis. We want our fact checking should be quick and instant given how little time ordinary people really have in their regular reading experience.
Our hope is to build a community of researchers and technologists working on the fact checking problem and publish our research for people to use, collaborate, and build on, from across the world. This means opening the dialogue up, being transparent about our approaches, taking criticism, and iterating fast.
The fact checking of statistical information is hugely important for digital journalism. At Factmata, our mission is to build technology that moves us towards creating a correctly informed electorate with data-driven opinions. We strive for a world where news information contains statistical claims that are true and up to date, and hope that technology will help track what politicians say and hold them accountable for their statements. As Jeremy Evans puts it in this great article:
In a world now drastically in need of clarity, journalism needs to evolve dramatically in order to serve our readers. Like it or not, most people get the vast majority of their information about the world from the news. We have a duty not only to report but to be bold enough to clarify confusion, correct errors, and present the true facts unambiguously.
- Please sign up for early access at http://factmata.com/.
- Subscribe to the Factmata Project on Medium.com for upcoming announcements and updates, and follow us on Twitter.
- Click here to tweet about this piece!
- We are also hiring a full stack NLP engineer who is passionate about using AI to automate political fact checking — apply here.
- We would love to hear from you — email email@example.com for any questions or add comments. Stay tuned!