New tools power collaborative data journalism with a local emphasis

ProPublica, Reveal, and AP are building infrastructure to support collaborative data journalism with impact

Will Fischer
Oct 30 · 8 min read

To effectively run Electionland and Documenting Hate — two large, national collaborative reporting efforts — ProPublica built a proprietary tool to manage them.

Both projects were behemoths. The sheer amount of data, coordination and management required was too much to use hacked-together webs of Google Sheets, Slack and email, which is what many collaborative projects across the U.S. depend on.

The technology worked well, and ProPublica knew it could benefit other newsrooms struggling to manage their own collaborative projects; their tool was one single location to compile a massive database of tips and add hundreds of users.

So Rachel Glickhouse, the newsroom’s first dedicated partner manager, lead a team that set out to beef up the tool and make it open-source so others could easily use it. ProPublica hired Brandon Roberts to build it, along with developer Ken Schwencke, who had worked on the previous databases.

Introducing ‘Collaborate’

Last month, ProPublica released Collaborate, a free open-source tool meant to fill this gap for journalists, data editors, and collaboration managers. It’s based on the database tools ProPublica constructed for Documenting Hate and Electionland, and has been packaged into a single resource available for anyone to use in collaborative projects. The Google News Initiative funded the project this year, which included Collaborate.

“There are some existing tools for crowdsourcing and shared spreadsheets, but when you’re working on something like Documenting Hate, you really need a tool that can handle a project in a more encompassing way,” Glickhouse said. “As a project manager, it’s much easier to keep track of everything.”

After uploading a spreadsheet in Collaborate, it’s possible to search, filter, and redact columns — making it easy to locate data while also protecting private information. There are ways to create tags and group data together, as well as assign data points to individuals or newsrooms, who can add notes or keep a contact log.

Glickhouse now works with more than 180 newsrooms for Documenting Hate, who have produced about 200 stories to date from some 6,000 tips. For the past three years, this type of technology has allowed Glickhouse to organize and manage the project through one main tool, and ProPublica hopes making it available to other newsrooms will give them the infrastructure to work on their own shared datasets.

A screenshot of the Collaborative tool.

In addition, ProPublica published a Collaborative Data Journalism Guide to accompany the tool, giving insight into how it approaches collaborative data projects — everything from planning to assembling to executing — and why it decides to do them in the first place.

Data journalism is one of the most impactful forms of reporting, and it works even better with collaboration. Datasets are often large, and one single newsroom can’t always cover all the stories that could come from such datasets— most newsrooms have limited resources, or they find that some of the data is not directly relevant to their audience.

ProPublica believes in making data available to as many reporters as possible, multiplying the potential impact of a dataset by getting more journalism out of it. In the past, it has done this through a public News Apps page, as well as its Data Store, where journalists and researchers can browse extensive datasets covering a wide range of topics (some of this data is free, and some requires premium access).

According to Glickhouse, ProPublica’s engagement reporters utilize the database tool, often for projects involving the 20 local newsrooms in ProPublica’s Local Reporting Network.

For example, ProPublica recently ran a crowdsourcing project where it aimed to collect tips from just one state, but ended up receiving tips from more than 20 states. Instead of those tips going to waste, it made them available to the relevant local reporters by giving them access through Collaborate.

“There’s always going to be a huge percentage of the dataset that you’re not using in a story,” Glickhouse said. We think that you should try to make the most of those big datasets so that other reporters can take advantage of it.”

Some of the most impactful collaborative data projects have been carried out with the same goal in mind. The International Consortium of Investigative Journalists (ICIJ) facilitated the Panama Papers and Paradise Papers investigations by distributing huge amounts of data — in the form of leaked documents — to hundreds of journalists around the world.

The data from the Panama Papers leak contained over 11.5 million files and totaled over 2.6 terabytes (the ICIJ was continuously processing this data throughout the entire project). When it launched in April 2016, there were almost 400 reporters involved who would go on to write more than 4,700 articles.

Adding in analysis

Collaborate, for all its capabilities, is not a tool for data analysis. This requires another level of complexity, as data journalists often work in different programs with varying structures.

The Associated Press also recently released a free and open-source tool that aims to solve this problem. AP Datakit, which AP has been using internally for its data journalism team, can standardize data projects across programming languages and form a common structure for collaboration.

According to Troy Thibodeaux, AP’s data journalism team editor, making AP Datakit available to other newsrooms will help with data analysis. It allows data journalists to work from a shared dataset in different programs while taking care of many of the painstaking and technical tasks required to analyze data.

The AP often releases datasets before publication to its 300 partners through its Data World service, so its partners can localize the data and produce their own stories.

“With many of these projects, we were going to do a big national data story,” Thibodeaux said. “But because we can make it collaborative and share the data, now our work is multiplied. We get every ounce of value out of the data.”

As a collaborative network on its own, AP has utilized its statehouse reporters to gather data across the country — documenting sexual harassment policies by state, and more recently, polling members of Congress for their standing on impeachment. But instead of just producing datasets and distributing them to a network, Thibodeaux wants to collaborate more creatively.

That includes some ambitious data projects, like trying to accurately calculate the total number of deaths caused by Hurricane Maria. Puerto Rico’s Center for Investigative Journalism teamed up with Quartz to produce data visualizations for the project, and AP came in to help prepare and analyze the data.

Reveal from the Center for Investigative Reporting has also approached AP for data analysis on racial discrimination investigations into redlining practices.

“Now data journalism is just journalism,” Thibodeaux said, “More reporters are coming to us and wanting to develop these skills and collaborate with us. Often the best source for a story isn’t a human being, it’s a spreadsheet.”

Data as a nexus for collaboration

When AP tried to collect data on child migrant shelters in the U.S., it couldn’t find any federal dataset available. So, in a joint effort with Texas Tribune and Reveal, the organizations collected local data to construct a larger picture, leaning on tips from people locating shelters in different areas nationwide. (ProPublica did a similar project with the Tribune.)

It’s similar to the crowdsourcing model that ProPublica employs to build datasets from scratch, and then distribute the relevant information back out to local journalists that can use it.

Reveal, too, has built its own Reveal Reporting Networks around the same concept, with an emphasis on local reporting. The reporting network started when Reveal launched an investigation into work-based rehab programs, and kept receiving local tips from different communities around the country. The model was applicable for every investigation that Reveal did — it just needed an infrastructure to share out the data more broadly.

Byard Duncan, an engagement reporter at Reveal, now leads the reporting networks, which have grown to about 850 journalists in just more than one year. Most of these journalists are focused locally, and they can choose to sign up for any of the four different networks (Duncan says Reveal is hoping to launch two more networks in the coming months).

The type of data varies by network — the work-based rehab data is located on a private spreadsheet that contains information about the tip, which Duncan holds and will share if journalists are interested in pursuing it. Others are more public, like the “To Protect and Slur” reporting network, which documents instances of police officers posting in extremist groups on Facebook.

“We’re basically trying to give this stuff away,” Duncan said. “We want to put local journalists on the five-yard line for an impactful story in their community.”

Reveal, ProPublica, and AP have all built real infrastructure to support a people-powered model for collaborative data journalism, one that ends up trickling back down to local newsrooms.

This has required new tools, responsibilities, and an understanding of what journalism can be. It’s also happened at the convergence of two growing beliefs — that newsrooms should build robust collaboration networks and invest in data-driven reporting — and it might just be a roadmap to a more sustainable future for quality journalism.

“This growth in data journalism is happening at the same time that newsrooms are strapped for resources and really need to collaborate,” Thibodeaux said. “When you put those two forces together, data as a nexus for collaboration is incredibly powerful.”

Will Fischer is a journalist covering the intersection of technology and media. He’s worked for Business Insider and New York magazine, and conducted local news research for City Bureau. Follow Will on Twitter @willfisch15 or email him at willfisch15@gmail.com.

Want to learn more about collaborative journalism? You can subscribe to our collaborative journalism newsletter for more updates and information. And of course, we invite you to visit collaborativejournalism.org to learn more about the topic of collaborative journalism — including our growing database of database of collaborative journalism projects, which is currently being updated.

About the Center for Cooperative Media: The Center is a grant-funded program of the School of Communication and Media at Montclair State University. Its mission is to grow and strengthen local journalism, and in doing so serve New Jersey residents. The Center is supported with funding from Montclair State University, John S. and James L. Knight Foundation, the Geraldine R. Dodge Foundation, Democracy Fund, the New Jersey Local News Lab (a partnership of the Geraldine R. Dodge Foundation, Democracy Fund, and Community Foundation of New Jersey), and the Abrams Foundation. For more information, visit CenterforCooperativeMedia.org.

Center for Cooperative Media

An initiative of the School of Communication at Montclair State University

Will Fischer

Written by

I write about media and technology. Follow me on Twitter @willfisch15 or email me at willfisch15@gmail.com.

Center for Cooperative Media

An initiative of the School of Communication at Montclair State University

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade