2016 C+J Symposium: Machine learning increases potential techniques for investigative reporting

By Chisom Oraedu

In July 2016, the Atlanta Journal-Constitution shined a spotlight on a nationwide sex abuse scandal committed by doctors against their patients.

They did this using computer programs to “crawl” through the full data set of doctors sanctioned by medical boards in the past, then used “machine learning” to search for keywords in legal documents that alluded to cases of sexual misconduct. Out of the 100,000 total sanctions analyzed, 6,000 were found to involve sexual abuse.

This is computational journalism, a practice becoming more and more prevalent in today’s data-driven world. Stanford Professor James T. Hamilton defines computational journalism as “stories by, through, and about algorithms.”

More broadly, computational journalism considers the importance of data in aiding story discovery, presentation, and monetization for news sources, Hamilton said in an interview this week.

At the Stanford Computation+Journalism Symposium this Friday and Saturday, keynote speakers and scholars are sharing key insights about the increasing symbiosis between journalism and the computing and data sciences.

Sarah Cohen leads a data journalism team at the New York Times. Her keynote “The Newest Muckrakers: Investigative Reporting in the Age of Data Science” drew on her years of experience in long-term investigative reporting.

Jeff Hancock is a Stanford communication Professor, and the director of the Stanford Center for Computational Social Science. His talk, “Truth, Trustworthiness and Technology in Political Campaigns,” focused on how or whether one can tell if information is truthful or accurate and how that can be analyzed.

On Saturday, the final keynote speaker, Tamara Munzer focused on information visualization systems, and the use of text as data. Munzer is a computer science professor at the University of British Columbia and holds a PhD from Stanford.

Other talks and paper sessions covered this weekend included, “Stories By and About Algorithms,” “Documents, Data mining and Discovery,” and “Finding Story Ideas in Large Datasets.”

Computational journalism lowers story discovery and production costs, and makes news sources more efficient in their content creation, Hamilton has said.

Computational journalism can also be applied to a broad range of reporting techniques. Kristian Hammond is the CTO and cofounder of Narrative Science, a company that trains computers to write news stories by using algorithms to mine vast troves of data. This produces cheap, easy-to-digest accounts of events, sports games, and other key happenings in real time.

In a 2012 interview with Wired, Hammond suggested that in 15 years more than 90 percent of news will be written by computers.

Wired also reported in that article that Hammond believes that “maybe at some point, humans and algorithms will collaborate, with each partner playing to its strength.”

The AJC may be an example of just that.

“It would have taken at least several years if we had needed to read and manually process every document,” Shawn McIntosh, deputy managing editor of the AJC, said in an interview. “….We could not have completed the project without (machine learning).”

Chisom Oraedu is a student reporter at Stanford University.

.@Stanford University's Journalism Program and Stanford Computational Journalism Lab focuses on multimedia storytelling and data journalism.

.@Stanford University's Journalism Program and Stanford Computational Journalism Lab focuses on multimedia storytelling and data journalism.