A new machine learning app for reporting on hate in America

Published in

Google News Lab

3 min readAug 21, 2017

Hate crimes in America have historically been difficult to track since there is very little official data collected. What data does exist is incomplete and not very useful for reporters keen to learn more. This led ProPublica — with the support of the Google News Lab — to form Documenting Hate earlier this year, a collaborative reporting project that aims to create a national database for hate crimes by collecting and categorizing news stories related to hate crime attacks and abuses from across the country.

Now, with ProPublica, we are launching a new machine learning tool to help journalists covering hate news leverage this data in their reporting.

The Documenting Hate News Index — built by the Google News Lab, data visualization studio Pitch Interactive and ProPublica — takes a raw feed of Google News articles from the past six months and uses the Google Cloud Natural Language API to create a visual tool to help reporters find news happening across the country. It’s a constantly-updating snapshot of data from this year, one which is valuable as a starting point to reporting on this area of news.

The Documenting Hate project launched in response to the lack of national data on hate crimes. While the FBI is required by law to collect data about hate crimes, the data is incomplete because local jurisdictions aren’t required to report incidents up to the federal government.

All of which underlines the value of the Documenting Hate Project, which is powered by a number of different news organisations and journalists who collect and verify reports of hate crimes and events. Documenting Hate is informed by both reports from members of the public and raw Google News data of stories from across the nation.

The new Index will help make this data easier to understand and visualize. It is one of the first visualisations to use machine learning to generate its content using the Google Natural Language API, which analyses text and extracts information about people, places, and events. In this case, it helps reporters by digging out locations, names and other useful data from the 3,000-plus news reports. The feed is updated every day, and goes back to February 2017.

The feed is generated from news articles that cover events suggestive of hate crime, bias or abuse — such as anti-semitic graffiti or local court reports about incidents. We’re also monitoring the feed to ensure that errant stories don’t slip in; i.e., searches for phrases that just include the word ‘hate’. (This hasn’t happened yet but we will continue to pay close attention.)

The Documenting Hate coalition of reporters has already covered a number of stories on this area, including an examination of white supremacy in Charlottesville, racist graffiti, aggression at a concert in Columbus, Ohio and the disturbing rise of hate incidents in schools.

Users of the app can filter the reports by searching for a keyword in the search box or by clicking on algorithmically-generated keywords. They can also see reports by date by clicking ‘calendar’.

The Hate News Index is available now and we will be developing it further over the next few months as we see how journalists use it day to day to unearth these stories of hate and help collate a national database to monitor.

The ProPublica-led coalition includes The Google News Lab, Univision News, the New York Times, WNYC, BuzzFeed News, First Draft, Meedan, New America Media, The Root, Latino USA, The Advocate, 100 Days in Appalachiaand Ushahidi. The coalition is also working with civil-rights groups such as the Southern Poverty Law Center, and schools such as the University of Miami School of Communications.

As part of our mission to create new resources for the journalism community, we are also open-sourcing the data on our GitHub page — let us know what you do with it by emailing newslabtrends@google.com.

A new machine learning app for reporting on hate in America

Written by Simon Rogers