What’s next for Factmata
On a mission to scale content verification
What a journey! We’re almost there!
It’s been quite an interesting journey for Factmata since we started in January and we’re now about to launch a tool that puts factual context in the hands of the people. This will happen around the UK general election, and marks the completion of our Google Digital News Initiative (DNI) project. For 5 months, we’ve been working around the clock with a distributed team of NLP researchers, PhDs and scientists from around the world to build this, and now finishing off the final touches.
As we prepare for launch, we wanted to tell the world about what’s next and where we want to take Factmata in the future.
Given our team’s work in automated fact-checking in previous research, we are uniquely placed to build AI to solve the problem of online misinformation. However, this is a very tough and complex problem, that will take a long time to solve. There are many people working on this (Full Fact, the Ferret, ClaimBuster, TwitterTrails), and tons of initiatives have been started in the past year. We want all of them to succeed and want to support them as far as we can.
How will it work
Using Factmata, you should be able to verify statements on social media, article comments, articles and any text on the internet about important economic issues. If someone makes a statement about the number of teenagers without jobs going up, you should be able to view the latest ONS youth unemployment data; if its a claim that economic growth is collapsing and the economy is a mess, we will link you to the official GDP growth figures. As individual users use the tool to help them verify statements (by receiving relevant data and stats), the system will be able to provide a “truth score” on statements, by aggregating users’ votes.
This tool is the first step in assisting citizens in fact-checking statements on the web, helping the world build a layer of truth over the internet. Factmata is as important for a CEO making an investment decision as it is for a college student who wants to make an informed decision at the polling booth. It should be a universal, transparent tool to counterbalance against the rise of misinformation and help retain trust in the internet as a source of truth.
From fact-checking to content verification
Factmata’s initial prototype and Google DNI funded project is about fact-checking, which is a subcomponent of fake news verification. Fake news refers mainly to the purposeful generation of news intended to mislead and misrepresent the truth, often totally making up events that happened or things that people said. The reason for fake news is often commercially-driven, in the case of Macedonian teenagers generating fake articles on the fly, or politically-driven, in the case of Russian bot-networks. Here is a great explanation about what is fake news and what are the various categories. Note, fake news is normally about purposeful disinformation, as opposed to unintentional misinformation. But misleading content (when it comes to text) is not just about fake news. It is also false rumours, false claims, misleading adverts, false promises. Sometimes, it is just about calling out bullshit, or finding poorly referenced information.
Factmata’s mission and the technology we want to build is about solving online misinformation and disinformation. This means we strive to build semi-automated, AI-assisted tools and services that help people and businesses detect and verify content in as close to real time as possible, cheaper, and less manually. Verification is not just about saying something is false or true, but to what extent it is credible.
Our first prototype will help people call out false claims about economic statistics like unemployment, immigration, climate change; but claim verification is a key subset of content verification as a whole.
From manual to semi-automated — AI-assisted human fact-checks at scale
Fact-checking is a super complex task. To reach a level of acceptable automation, humans are going to have to help (Facebook has hired an additional 3000 “content moderators”. But fact-checking is a lot more nuanced than filtering content for hate speech or terrorism, hence there is still work to do). But the great thing is, humans have been natural fact-checkers for years on the internet. The role of Factmata’s AI is to augment humans tackling misinformation themselves. By combining technology and augmenting human expertise, Factmata has the potential to gradually make the life of the human fact-checker easier. We are all investigative journalists, we just need to be empowered and given the right tools to do so.
The extension we are releasing is a first experiment in learning about how experts and humans construct arguments and validate information, understand how claims are verified for specific domains, asking the right series of questions to check information, and then provide a logical, reasoned explanation for why something is right or wrong.
Krishna Bharat captures pretty much what we are building in this piece on how to detect fake news in real time.
Countering bias and being fair
Despite the fact that newspapers have an editorial policy in place, fact-checking organisations try hard to have equally balanced donors and codes of conduct in place, or Wikipedia has an edit correction mechanism, no one can truly be unbiased. Rules-based systems can have bias, as can machine-learning models. Even trust in something is a particular form of bias!
Our approach and viewpoint at Factmata is that only a totally accountable, explainable algorithm and methodology for validation, which anyone can interrogate and criticise, is the best way of combating bias in fact-checking. We do not want to operate a black box, but rather offer full transparency to the process followed by our fact-checking system and platform. And validation of our methodology is a way that we can prevent bias from being corrupted unintentionally.
If we are going to be providing a truth score on the internet, everyone should know the mechanics of how we calculate it — what is weighted how and by what %, which verifiers existed for a particular piece of content and what their reputation scores were, what calculation was made against what database for a fact check.
If people have doubts over our neutrality, they can look into our algorithm to suggest changes to it, and there will be a fair decision mechanism on a change. Perhaps for a particular claim, the algorithm did a fact check on a sub-optimal dataset. Maybe there were different data sources saying different things, in which case the algorithm should display both and clearly explain how a claim is wrong in one case but right on another. Note, being transparent is not the same as being unbiased; it will just help expose our biases.
Our system will endeavour to provide as much reason for concluding something as factual or not, and ensure the network trusts that judgement. We are opening exploring technologies that ensure trust, such as the blockchain, to give each fact-checker a fair ability to work on the best fact-check for his/her expertise, that the network trusts and agrees with.
Finding a sustainable model for fact checking
It is pretty clear to me that ensuring the quality of information dissemination is a major task with a lot of responsibility attached to it. We need resources (time, money, brains, and focused effort) to achieve this. But this problem needs to be worked on now, before it’s too late.
Factmata will always be totally open about how we do things. But in order to fund our operations, pay for our AI-assisted fact-checking network, and scale fully to achieve our vision of a more accurate web, we will find revenue streams and use cases for our underlying technology. However, anything provided to the public will be totally open and free for use.
Why is this important right now?
As you might have read in our introductory post back in January, Factmata started off in the realm of political fact checking. This task is enormous, and we will continue to make our technology available for fact-checkers like Politifact, Snopes and Full Fact to use. However, the problem we are solving is inherently larger and not limited to politics at all: misinformation.
Misinformation isn’t just about politics and politicians
Misinformation is everywhere. As we know, governments spread information as propaganda, advertisers sell false information for commercial profits, reviews have hidden incentives, or even friends express a badly researched viewpoint because that is what they believe in. These biased viewpoints are fed to us to suit other people’s needs, not ours.
Misrepresenting the truth is ingrained in how the internet and media works.
Social media platforms like Facebook and Twitter incentivise fast production of posts and Tweets, and they are increasingly our source of news. The key thing here is that there is no official sub-editorial process or professional fact checking team on these platforms (of course, there are fake ads and fake account teams, and in-house content moderators for hate speech). Anyone technically can say anything, without any filter or fact verification process. People share content that they think is exciting, racy and fun, which reflects a particular viewpoint. The difficulty is that these systems are also clear incentives for filter bubbles.
On Facebook, Twitter, or any micro-blogging platform in that regard, you only see the content that your friends or people you follow share. If your friend says “eating full fat leads to weight loss”, most people would take that as given without checking the latest scientific research. If another friend says “Muslims are flooding into the country and taking our jobs”, it is difficult to instantly check this against the official immigration figures and call someone out. Due to the speed and self-fulfilling nature of information bubbles, viewpoints are formed quickly and easily and it is difficult to alter this; what we really need is immediate factual context.
Providing additional viewpoints and context is not in the interest of the news aggregator or newsfeed model. There are a few reasons:
- The focus on personalisation means an algorithm you don’t have a say in will push you content it thinks you will like, rather than what will inform you about the world.
- The focus on surfacing information your friends share, rather than getting you out of your filter bubble and getting you to think about what the rest of the world or people different to you might be saying.
Note, this personalisation is incentivised, because it is in the incentive of an ad-driven platform to put you in a bubble (e.g, likely to vote Republican, likes football, has an outward personality). Perhaps you are not that specific person, all the time. But the platform can’t sell you as some fuzzy profile, because the advertiser wouldn’t be able to micro-target you.
Finally, professional news organisations have the same issue as online platforms. The only difference between a Macedonian teenager writing a story to generate maximum clicks and an established newspaper increasing readership is that the newspaper has an editorial policy, public scrutiny, publishing standards, laws, and (mostly) good taste.
Magazines often obscure the real things that people say in order to create stories that seem more farfetched, more aggressive, more cynical, and nastier to drive a certain narrative in readers. The way that words are written is important. This should not be underestimated.
Our long term goal — an accreditation layer on the internet for bias and misrepresentation of the facts
We hope that one day Factmata will build a “trust score” for claims, articles, sources, and websites, to empower people to take charge of the online information they digest, and differentiate information they can trust, from that they can’t. Just like you have scores for your home’s energy efficiency, a financial bond, your computer’s likelihood of catching a virus, or your credit history, at Factmata we believe that the time has come to have an accreditation score on information that has clear protocols and trust schema in place. This has to be embedded in content itself, and provide a reasoned, open explanation for why it provides its “B+ rating” on a piece of content. an xiao mina of Meedan picks this up well in this piece about a “credibility schema”.
We are past the internet honeymoon period of information being readily available at our fingertips. Now, that same information can have a hidden agenda, misrepresent the truth, and be detrimental to how we view society. How are we meant to know what is accurately representing the facts?
It is hard to thoroughly fact-check what you read. Of course, Google was built to help you answer questions about the world if you are proactive enough to do so. But it is increasingly difficult to do this in context and immediately. We have been pushed into exceedingly tight filter bubbles, such that searching for additional context from your newsfeed is near impossible.
We built Factmata out of a desire to strip that away. We want to allow people to enjoy the freedom of information that the internet allows, once again, enabling them to know content they can trust, and what they should be skeptical of. We want to build an independent platform that builds a real-time factual layer over information people read, and gives people the tools to get additional context and facts around content. Until content production incentives on the web change from being ad-driven, we believe this is one step to solving misinformation.
Factmata will solve this one day. We don’t know exactly how, yet, but we will figure it out along the way.
Thanks to my team members and my friends for reviewing this draft. Please sign up for the private beta launch of Factmata on factmata.com.
And register to vote — you have till the 22nd May!