Tow Tea: Computational Journalism in Practice

Tow Center
Tow Center
Published in
4 min readOct 20, 2015

We live in a world of computers: there is a computer in our bag and there is one in our pockets. Increasingly today, we put a computer on four wheels and call it a car. We step into a computer animated box to ascend to the Nth floor of a building — or step into an even larger computerized container and fly 30,000 ft high. In journalism, computers can process 20 million rows of data and output the content of a single cell they were looking for; and ever more often, computers can write thousands of reports, news articles or wire stories by themselves.

What computers cannot do is decide what story to write — or decide what a story is. So are computers taking away the jobs of journalists? And what do aspiring journalists need to know about the computational side of journalism in order to successfully report with data or design automated news writing applications? And if a correction is needed, who is to blame?

These were the opportunities and dilemmas discussed at the October 15 Tow Tea, “Computational Journalism in Practice,” which invited Meredith Broussard (NYU, Tow Center Fellow), Tom Kent (Associated Press), and Olga Pierce (ProPublica, Columbia Journalism School) to talk to Susan McGregor (Columbia Journalism School, Tow Center Assistant Director) about their experience and work in a newsroom where journalistic roles, reporting skills, and ethical accountability are increasingly shared between journalists and their computers.

Broussard, who is a software developer by training and teaches journalism at NYU’s Arthur L. Carter Institute, said that unlike traditional software development, her approach to data journalism involves building single purpose tools. In her latest project, which appeared in The Atlantic (“Poor Schools Can’t Win at Standardized Testing”), she built a custom program to inventory textbook availability in Philadelphia schools. This allowed her (or anyone else) to check the correlation between students’ performance on standardized tests with their schools’ inventory of the textbooks students needed to prepare.

Broussard said data journalism in practice works happens when one notices an anomaly in a pattern or an outlier in a trend, and that a data journalist distinguishes between what should be — i.e.: what is inscribed in laws and policies etc. — and what there is in reality, as represented by the forever accumulating troves of public sector data.

“‘Hm, that’s weird!’ Every time you say that, it’s a story,” says Broussard.

Kent, who is the Standards Editor at Associated Press (AP), helps oversee the development of computer-generated financial reports and sports stories — as well as evaluate the potential ethical questions that can arise from a glitch in the system or a misunderstood algorithm.

“When we started the concept of computers writing stories — that raised a lot of questions. What are the ethical issues around that?” said Kent.

A major part of using computers to write stories is to make sure their instructions for doing so are designed responsibly in the first place. At the AP, human journalists must first define phrases to work with pieces of data and rank them so that the algorithm can select the appropriate version. This helps the program choose when it’s appropriate to write “a spectacular victory,” rather than “the first victory of the season in which…”, in order to write a good-quality story.

Partly for this reason, introducing computer-written news production to AP does not imply layoffs. “We’re not going to reduce the number of journalists. We want to let them work on more valuable stuff: investigative journalism, writing features, which computers cannot do.”

Computers also cannot be held responsible for any mistakes in the stories “they” write, Kent makes clear.

“You must have a full disclosure of your robot-journalism. And whenever we make a mistake, AP as a whole is responsible,” said Kent.

In her presentation, Olga Pierce introduced the ProPublica “Surgeon Scorecard,” an interactive and visual database and query tool in which she and her team rated the work and efficiency of surgeons operating in various New York hospitals. The Scorecard is based on Medicare records from a handful of common surgeries, cross-referenced with readmission records and other factors.

The result is a complex piece of journalism, part story and part online application. In addition to careful programming, this effort required the input of medical experts, statisticians, lawyers, and journalists — a trend that Pierce sees as a necessary shift from the old concept of journalists relying solely on trusted sources, press releases, or individual research.

Even so, the Scorecard provoked mixed response, from Harvard professors praising ProPublica’s work to prestigious medical practitioners laying critiquing its methodology.

So how should journalists interested in doing this kind of work themselves someday proceed?

“Learn a general purpose programming language and have a working knowledge of statistics — this is the standard toolbox at this point,” said Pierce.
This was one of many point on which all the panelists agreed: aspiring journalists who want to work with data and must learn to instruct computers effectively to do what these machines do best — compute.

--

--

Tow Center
Tow Center

Center for Digital Journalism at Columbia Graduate School of Journalism