Data Journalism and the Value of Making Source Data Public
By Sérgio Spagnuolo, from Volt Data Lab — Follow me on Twitter
As Volt Data Lab advances its business plan towards a more sustainable path — publishing graphics each day, organizing training courses and doing consultancy for NGOs and newsrooms — it is getting clearer and clearer the general value of adding sound, credible references to news reports and graphics.
For instance: together with our partners at the fact-checking site Aos Fatos, we have been publishing a series of stories about Brazil’s political turmoil — practically all of them based on data.
We were the first outlet to calculate the size of president Dilma Rousseff’s government debt with state-owned banks (the main point in the impeachment process against her), using the case of Caixa Econômica Federal. Several renowned journalists quoted our report and other outlets chased the same story after this.
The Aos Fatos website also got a lot of traction after we reported on how many congressmen in Brazil with corruption and mismanagement lawsuits voted, both in favor and against, for Ms. Rousseff’s impeachment in a preliminary congressional hearing.
Our readers seemed to appreciate how open we were with our data, prompting competitors to also do the same.
This, for us, means a lot.
Data-driven journalism is growing in Brazil, trying to pick-up what major newsrooms have been doing abroad. Scoops are coming from data-first investigations, great graphics and maps are in high demand from publications, analysis is being written and even a major journalism award was granted to a fully data-driven report.
But there is a major, crucial difference that sets apart traditional newsrooms and independent projects when it comes to data journalism: referencing a story.
Independent ventures like Volt, InfoAmazônia and Fiquemsabendo, for example, rely heavily on making data and methodology as transparent as possible in their reporting. There is clear value in that, for a series of reasons:
- It gives the publication credibility and transparency, as people and organizations can cross-check what you do, corroborating and possibly correcting eventual mistakes.
- Most of the time, the journalist is not the one who generated the data, but the only who is compiling it and analyzing it, so it’s not proprietary data.
- It makes it easier for other people to use that data set for their own purposes, enhancing collaboration and knowledge sharing.
In the U.S., for instance, this practice is much more widely adopted — almost a rule. Sites like FiveThirtyEight and ProPublica, for instance, have whole repositories (here and here) of referenced data in GitHub.
While some progress has been made by bigger newsrooms — see the example of what Estadao Dados does sometimes — it’s still hard to see any documentation, even links to original sources of information.
Volt is proud to be making public all the data and tables used in our reports and projects. That pride is reflected in the value our partners and clients get from us.
We recently produced four stories about the municipal budget in Sao Paulo for data transparency project Gastos Abertos, all fully documented — including the tools used and not only where, but even how and when the data was obtained.
Also, we are constantly in talks with potential clients and partners that want, especially, our capacity for coming up with good stories while maintaining a high level of transparency.
Being open about our reporting, providing adequate sourcing and credit and making it easier for people to understand our processes is not only good, quality journalism, it’s also a good business practice.