Combating Fake News: A Machine Learning Approach

In recent months, fake news has been a constant source of debate and divisiveness in America. Political pundits, politicians, and everyday citizens have been confronted with a barrage of claims that fake news isn’t just actively misinforming people, but that it also played a non-negligible role in the election for the highest office in this country — the presidency. After a hard fought primary and election season, President Donald Trump shocked the world when he defeated Hillary Clinton on November 8th of last year. In the days following the enormous upset (almost every political poll projected a Clinton victory), protests , jubilation, political rallies, and victory parties dotted the nation, highlighting a fitting end to one of the most divisive elections in recent American history.

How could so many smart people get it so incredibly wrong? This is the question that most political pundits concerned themselves with in the fallout of the election. Many large media organizations had to find a way to explain the failure of their foregone conclusions — some projections even gave Hillary a 99% chance of winning. Big Data is many things — an overused buzzword, a resume booster, a company bottom line — but it is rarely ever wrong. By definition, data concerns itself with the facts. Data Scientists create mathematical models that look at the data, and output predictions. This is an oversimplification of the process, but for the purposes of this post, it suffices to say that Big Data is revolutionary in many ways — which is what makes the result of this election all the more fascinating.

This has been the year of the statistical outlier— Leicester City shocked the world when they won the English Premier League (overcoming the 5000–1 odds that most bookmakers placed on them). The Golden State Warriors blew a 3–1 lead to the Cleveland Cavaliers in the NBA Finals (the Warriors were given a 92% chance of winning the title after taking the first 2 games). The Cleveland Indians pulled the reverse against the Chicago Cubs, dropping a 3–1 lead in the World Series for the city that just overcame one several months earlier. The incredible list goes on, but nothing is as significant as the election of President Donald Trump.

The Problem

Interestingly enough, in the fallout of the November 8th election, Facebook found itself having to defend against claims that their massive social network was actively perpetuating and propagating fake news. Facebook’s bottom line has always been about bringing you the most relevant content to your news feed. Relevant is a relative term in this context — what’s relevant to me, may not be what’s relevant to you (and that could be the result of differences in our age, interests, habits, and among many other things, our political ideologies). Facebook has been described as an echo chamber — users typically like pages, follow sources, and interact with people that share the same views and ideas. For this reason, it is concerning that 62% of adults in the U.S get their news from social media. In the Web 2.0 era , Facebook is increasingly becoming the face of digital journalism— their firm grip on ad revenue dollars only makes the platform more valuable. But with great power, also comes great responsibility. Facebook delivers news on it’s network, but that news hasn’t always based on facts since the integrity of sources wasn’t being thoroughly vetted. Recently, however, Facebook has taken some commendable steps to confront the spread of Fake News.

We are drowning in information, but starving for knowledge. — John Naisbitt

The issue of fake news, however, isn’t an issue that started with Facebook, nor is it an issue that will end with Facebook. There have been claims that part of Trump’s political momentum was the result of fake news, but the issue extends beyond him. Now, more than ever before, every citizen must be weary of the information that they consume. Often times it is too difficult or too time consuming for us to manually fact check every statement made by a politician, forcing us to take many things on faith. Skepticism is healthy, and being an active citizen requires you to question everything — the truth is not complacent. Remaining vigilant about the information we are fed is a tremendously important task — a popular lie is more powerful than an infallible truth.

Our Solution

Fake news is a bipartisan issue that requires all hands on deck. President Trump has called several major news outlets fake news — coming from the highest office in the land, these are serious accusations. While Facebook has taken important steps to confront misinformation, it’s important to note that people can get their news from any number of places online — not just Facebook.

Myself and two very close friends, John Bowllan and Jay Silverstein, are developing a Machine Learning approach to fact checking. Thorough fact checking is an arduous process and often takes professionals extended periods of time —in this time, however, fake news can spread like wild fire before the truth sees the light of day. As technologists, we have the unique ability to tackle issues at scale, and we thought this was an important issue to confront.

We plan on developing a Chrome extension that can automatically rate news stories for you. This is project poses some difficult challenges that scientists on the frontier of AI research have been struggling with, such as Natural Language Processing — a notoriously difficult problem for computers. Our solution will build on existing open source work, APIs, and data science techniques in order to engineer a browser extension that can tell you if a news headline is 100% reliable or not, without you having to manually research the topic.

We are undergraduate students enthusiastically undertaking this important work, so we know that we probably won’t crack what some have called the holy grail of AI. However, we do believe that this project is worth our time since we will undoubtedly learn a great deal process. This is the first in a series of blog posts that will track our progress — I will talk about what we learned, how we confronted difficult technical challenges, our highs, our lows, our successes, and our failures.

My selfish desire for this project is that we will not only develop a working product (starting with an MVP, and building up from that), but that we will also inspire other young engineers to tackle big problems in a similar fashion. Technology has always been a means to an end, not the end itself. Our goal should be to develop a smarter, healthier, kinder world — and technology is just one bridge that will get us there. This will be an interesting journey, and I look forward to seeing where it takes us.