From newspapers to NLP: recording fatality data

At CDS’ text-as-data seminar, Brendan O’Connor discusses how to use NLP to record civilian fatality and police brutality data from newspapers

Because approximately 30% of all law enforcement homicides go unreported, we lack reliable data about the frequency of civilian fatalities and police force usage. The alternative is to scour through news reports: crowd-sourced activists like Fatal Encounters have manually read through two million articles, and the Bureau of Justice Statistics hires people for the same task.

But Brendan O’Connor from University of Massachusetts Amherst may have found a way to make this process easier. Speaking at one of our text-as-data seminars last semester, O’Connor explained how he and his colleagues have trained computational models to obtain fatality records from the news.

Their new approach uses NLP for social analysis by performing two tasks. The first task concerns database population, where the model computationally infers the names of people killed by police during a particular time frame. The second task is updating the records of an existing database with the new information.

Every instance of a name in any news report is treated as a “mention.” For each “mention,” the goal is to assess the probability of whether or not it describes whether that person was, indeed, killed by police. The hope is that a reasonable prediction will result from each “mention,” followed by an aggregation of the classification probabilities to determine how probable it is that this person was killed by police. If it’s highly probable, the corresponding data is added to the database.

Using Google News, the researchers downloaded 1.1 million news reports published between September and December 2016, and ran a news scraper that identifies names and keywords. Compared to Fatal Encounters, the group’s algorithm successfully detected 258 out of the actual 452 positive civilian fatalities caused by police.

Further work is necessary to achieve the same accuracy as manual techniques before the algorithm can be put to use by practitioners. Ultimately, however, their promising model could become a powerful tool to support groups like Fatal Encounters and the Bureau of Justice Statistics.

by Nayla Al-Mamlouk

Originally published at on 20 June 2017.

Show your support

Clapping shows how much you appreciated NYU Center for Data Science’s story.