D4D Project Highlight: Are You Fake News?

Astrid Willis Countee
Data for Democracy
Published in
4 min readMay 17, 2018

--

#p-areyoufakenews is a Data for Democracy project that was started by @zach estela to help determine if a news article is fake, real, or some mix in between. Below, Zach outlines the history of this project, and where he hopes to take it in the future.

Are You Fake News?

ydr.com

by Zach Estela

Origins

Areyoufakenews.com started as my capstone project at Galvanize in October 2017, spurred by the rise of fake news designed to disrupt the democratic process and captivate emotions for ad dollars.

At its least harmful, fake news reinforces people’s biases, at its worst it pushes online discourse to the extremes, promotes divisive feelings, and strains personal relationships.

The extent to which engineered viral content had shifted the dialog was shocking to me. I started the project with the intention of using my data science skills to provide a service where users can leverage the power of AI to contextualize the content in their news feeds. The service is not meant to be a sole source of truth but rather to shift the difficult task of parsing overwhelming stream of new content to a beneficial AI.

areyoufakenews.com

The core of the project is a live web scraping engine that analyzes hundreds of news articles through a neural network for each user request. The neural network itself is trained on over 100,000 examples of biased articles, the labels for which were provided by academic and not-for-profit organizations. When a user enters a news website into the search box, hundreds of news articles are retrieved and are analyzed 17 types of bias. The predictions are then displayed visually to the user.

Since the original hacked-together service was built in three frenzied weeks in the fall of 2017, many changes have been added. Hosting live neural networks and web scraping in the cloud is a unique challenge, and much has been improved:

The Model

areyoufakenews.com

AI: The AI model in production has evolved from using cosine distance, to a deep neural network, and finally a convolutional neural network with custom word embeddings.

DevOps: Increased the site’s resiliency to increased user traffic. Designed around ‘serverless’ microservices architecture and asynchronous web workers, the site can accommodate the traffic that comes with increased publicity.

Data persistence: Results from previous queries are now cached, which allows for a faster service with more complete information. The stored data provides an opportunity to develop more compelling data visualizations.

Bugs: stability and latency has improved dramatically in the past few months. Many web related bugs have been patched.

What’s Next?

The path ahead looks bright. In collaboration with volunteers at Data for Democracy, new features are being drafted and built.
Among the most transformative are:

  • A browser extension that provides users with a live dashboard of sites they visit for instant awareness of potential biases.
  • New labels are being added to the machine learning model. These meta-categories will encompass multiple existing labels under one umbrella in order to develop easily understandable metrics, in the form of a “trustability index” and “bias index”
  • Support for analyzing public Facebook pages and YouTube videos is in early testing
  • Interactive data visualizations where users may explore bias between sources, over time, and by geography.
  • A public API for not-for-profit developers to integrate into their websites or services.

We look forward to growing as a project and supporting the values that can contribute to a more democratic society: integrity and accountability in journalism, ethics in data science, and open source collaboration.

By providing a free, publicly accessible protection against disinformation we can fight the spread of fake news and empower social media users to share high quality information.

swadeology.com

If you are interested in this project or others like it, join us and get involved in the #p-areyoufakenews channel.

--

--

Astrid Willis Countee
Data for Democracy

Env & Tech Anthro focused on climate resilience, pub health and risk for vulnerable pops #climatechange, #health, #misinformation #medicalanthro #socialjustice