How open is open data journalism?

The recent Data Journalism awards show that journalism hasn’t quite got to grips with the full meaning of open data.

Simon Rogers published a post last week that asked “What does data journalism look like in 2016?”. For Rogers, the winners of the data journalism awards “give us a great sense of where the industry is right now.”

He’s right, the range and depth of the use of data is reassuring and the points Simon raises are well made and offer much food for thought.

But I did find myself getting snagged on one of his points: Open data is still vital.

The awards had a specific category for Open data:

Open data award. Using freedom of information and/or other levers to make crucial databases open and accessible for re-use and for creating data-based stories.

The language used here sits comfortably next to generally accepted definitions of open data. Here’s the definition of open data from http://opendefinition.org/ for example:

“Open data and content can be freely used, modified, and shared by anyone for any purpose

The Open Data Handbook definition is helpful in highlighting the sharing element:

Open data is data that can be freely used, re-used and redistributed by anyone — subject only, at most, to the requirement to attribute and sharealike.

The winner of the open data category, LA NACION DATA — OPEN DATA Journalism for change is, as Rogers notes in his post:

“a model of open data journalism and this year won the prize for its approach to opening up public datasets in a country with no FOI laws and a long history of limiting media access to government information.”

It does everything required of it by both the definition and the category description. A well deserved win.

Rogers also cites Excesses Unpunished, by Convoca in Peru which “opened up public data to help its users understand the country’s mining industry better.” The project is a media rich and superbly executed investigation and presentation; it pulls together multiple data sources and offers a deeply informative view making the issue and the information accessible. That’s different from open. And there is the snag.

By the definition of open data (and the category criteria) the Convoca report didn’t fully open up public data. Where is the data that means I can check the work or make my own stories? The data they have created isn’t open and accessible for re-use.

And there is the snag.

If you look at other entries the shortlist in the category, it’s a similar story.

THE EXPRESS TRIBUNE (Pakistan)— a nice piece of data driven investigation into the health issues caused by urban pollution that builds on existing research with solid reporting. Sadly the study by Khyber Teaching Hospital and Peshawar Traffic Police conducted isnt linked. Neither is the Nature report. VERDICT: CLOSED DATA

Trinity Mirror(UK) — a great piece of local journalism with a nice level of interaction. But the data is from a commercial supplier with paid for access to the original data. VERDICT: CLOSED DATA

Modern Investor magazine (UK) — A deep and focussed investigation into local government pension schemes that, for small team, packs a punch. The investigation done in part with data derived from hundreds of FOI requests has created a “unique database”…that isn’t open. VERDICT: CLOSED DATA

LeMonde (France) — A great piece of work, in particular their partnership with journalism students but where is the data? VERDICT: CLOSED DATA

It’s not all bad news though. The IndiaSpend (India) project is a great piece of sensor driven data journalism. I love it. But where is the data that drives the map? The umbrella IndiaSpend project does have a “data room” which shows a plan to make the data open VERDICT: OPEN (SUSPENDED)

For me, the only other shortlisted project on the list besides La Nacion, that makes the grade in terms of open is MWAZNA.(Egypt). Their attempt to ”explain and visualize government budget for everyone” is admirable and works well. Best of all, the data is available to download with clear liscence and in an open format. VERDICT: OPEN

MWAZNA’s Budget in’s and out’s interactive links to the data which is clearly open. Exemplary stuff.

All but two of the projects on this list (three if we accept the direction of travel IndiaSpend are taking) actually make their data open. Remember, this is the shortlist not all entries. So these are deemed as open data by the judges.

So what’s the problem.

It’s fair to argue that resources and technology are an issue when it comes to making data open, they are. But Mwazna entered in the small newsroom category and LeMonde are clearly not short of resources in comparison. So you can’t say its size.

Privacy and data protection are also appropriate concerns I’ve heard voiced around opening up newsroom data — especially in a world where protecting sources and responsible use of data are often linked. This is a fair concern as far as it goes but as open data advocates are fond of telling government and other bodies, opening up data doesn’t have to mean all your data. If you have a dataset running a visualizations then that data set shouldn’t have data protection or privacy issues associated with it.

What is open data journalism?

I think the real problem is the use of the word open. As I have noted elsewhere, open is really about where you put the pipe.

  • open| data journalism — data journalism done in an open way.
  • open data | journalism — journalism done with open data.

Either way, the shortlist reflects, at best, a patchy approach to both views.

There is an all too common confusion by journalists of the use of FOI to get data and open data. Using FOI is not open data. Its using a mechanism of open government to get data. Yes the data you get may well be delivered in an open way it may even be open data. But using FOI to “open up data” to do journalism and then not sharing the data you use is not open data or open journalism.

Open data journalism should be using open data, FOI’s or any other sources to collect data to tell a story and then sharing THAT data with your audience.

Does it matter?

Just to be very clear here. I’m not saying that any of the work here is bad journalism. So perhaps I’m being dogmatic or even a little pedantic about the use of the term open data. When there is clearly such good journalism going on shouldn’t we just get on with it? Well, maybe.

But if the practice of data journalism is to deliver on transparency and openness, then it needs to be part of the process. The data it has needs to be open and, especially when it judges itself, it needs to respect the full extent of what that means rather than simply adopting the phrase in such an uncritical way.

I think if journalism really started to embrace the broader meaning of open data, it would be better off for it.