Is Data Journalism any more open?
What does the shortlist of the open data category of the Data journalism awards tell us about data journalism’s approach to open data.
Last year I wrote about how the 2016 Data Journalism awards illustrated that journalism hasn’t quite got to grips with the full meaning of open data. So I thought I’d take a look at this years crop and see if things had improved.
This is last years definition for the open data category:
Open data award  Using freedom of information and/or other levers to make crucial databases open and accessible for re-use and for creating data-based stories.
This years was the same save for an addition at the end.(my emphasis)
Open data award  Using freedom of information and/or other levers to make crucial datasets open and accessible for re-use and for creating data-driven journalism projects and stories. Publishing the data behind your project is a plus.
A plus! The Open Data Handbook definition would suggest it’s a bit more than a plus…
Open data is data that can be freely used, re-used and redistributed by anyone — subject only, at most, to the requirement to attribute and sharealike
…if you want people to re-use and re-distribute then people need the data.
Lets take a look at this years shortlisted entries and see how they do with respect to the open data definition.
So, in the order they appear on the shortlist…
Analyzing 8 million data from public speed limit detectors radars, El Confidencial, Spain
This project made use of Spain’s (relatively) new FOI laws to create “an unique PostgreSQL database” of traffic sanctions due to exceeding the speed limits. A lot of work behind the scenes then to analyse the results and a range of fascinating stories off the back of it. It’s a great way to kick the tyres of the legislation and they’ve made good use of it.
Most of the reporting takes the same form. The story is broken down into sections each accompanied by a chart. The charts are a mix of images and interactives. The interactive charts are delivered using a number of platforms including Quartz’s Atlas tool but the majority use DataWrapper. That means that the data behind the chart is usually available for download. Most of the heavy lifting for users to search for their area is done using TableauPublic which means that the data is also available for download. The interactive maps, made on Carto, are less open as there is no way to get at the data behind the story.
Verdict: Open(ish) — this makes good use of open government legislation to create the data, but is that really open data. The data in the stories is there for people to download but only for the visualisations. That’s not the whole data set. There also isn’t an indication of what you can do with the data. Is it free for you to use?
Database of Assets of Serbian Politicians, Crime and Corruption Reporting Network — KRIK, Serbia (this site won the award)
For their entry independent investigative journalism site KRIK created “the most comprehensive online database of assets of Serbian politicians, which currently consists of property cards of all ministers of Serbian government and all Serbian presidential candidates running in 2017 Elections.” Reading the submission it’s a substantial and impressive bit of work, pulling in sources as diverse as Lexis and the Facebook Graph. They even got in a certified real estate agency “which calculated the market values of every flat, house or piece of land owned by these politicians” Amazing stuff done in a difficult environment for journalism.
Verdict: Closed — This is a phenomenal act of data journalism and would in my view, been a deserving winner in any of the categories. But the data, whilst searchable and accessible and certainly available, isn’t open in the strict sense.
Using information access legislation and good old journalistic legwork, Oxpeckers Centre for Investigative Environmental Journalism pulled together a dataset of mine closure information that revealed the impact of a chaotic mining sector in South Africa. The data highlighted the number of derelict mines that hadn’t been officially closed and were now being illegally and dangerously mined. There’s a nice multimedia presentation to the story and the data is presented as an embedded Excel spreadsheet.
The project has been developed and supported by a number or organisations including Code for Africa. It’s no surprise then that the code behind parts of the project via github. The data itself is also available through the OpenAfrica data portal where the licence for reuse is clear.
Verdict: Open. The use of github and the OpenAfrica data portal add to the availability of the data which is clearly accessible in the piece too.
Pajhwok Afghan News, Afghanistan
Independent news agency Pajhwok Afghan News have created a data journalism ‘sub-site’ that aims to “use data to measure the causes, impact and solutions driving news on elections, security, health, reconstruction, economic development, social issues and government in Afghanistan.”
The site itself offers a range of stories and a mix of tools. Infogr.am plays a big part in the example offered in the submission. But other stories make use of Carto and Tableau Public. The story “Afghan women have more say in money that they earned themselves than property in marriage” uses Tableau a lot and that means the data is easy to download, including the maps. That’s handy as the report the piece is based on (which is linked) is only available as a PDF
Verdict: Open(ish) — the use of Infogr.am as the main driver for visualisation does limit the availability of the data, but the use of Tableau and Carto do raise the barriers a little.
ProPublica Data Store, ProPublica, United States
The not-for-profit investigate journalism giant Pro-Publica have submitted a whole site. A portal for the data behind the stories they create Interestingly Pro-Publica also see this project as a “potential way to defray the costs of our data work by serving a market for commercial licenses.” that means that as a journalist you could pay $200 or more to access some of the data.
Verdict: Open. Purists might argue that the paywall isn’t open and ideally it would be nice to see more of the data available and then the service and analysis stuff on top rather than the whole datasets being tied up. That said, its not like ProPublica are not doing good work with the money.
Researchers bet on mass medication to wipe out malaria in L Victoria Region, Nation Media Group, Kenya
This piece published by The Business Daily looks at plans to enact a malaria eradication plan in Lake Victoria region. The piece takes data from the 2015 Kenya Malaria Indicator Survey amongst other places to assess the impact of plans to try and eradicate the disease.
Verdict: Closed. The work done to get the data out of the reports (lots of pdf) and visualise it is great and its a massively important topic. But the data isn’t really available beyond the visualisations.
Like last year it’s a patchy affair when it comes to surfacing data. Only two of the entries make their data open in a way that sits comfortably in the definition of open data. For the majority, the focus here is on using open government mechanisms to generate data and that’s not open data.
As noted last year, what open data journalism should be, is really about where you put the pipe;
- open| data journalism — data journalism done in an open way.
- open data | journalism — journalism done with open data.
By either definition, this year’s crop are better representative of open data use but fall short of an ‘open’ ethos that sits at the heart of open data.
Does it matter?
I asked the same question last year; In the end, does the fact that the data isn’t available make the journalism bad? Of course not. The winner, KRIKS is an outstanding piece of journalism and there’s loads to learn from the process and thinking behind all the projects. But I do think that the quality of the journalism could be reinforced by making the data available. After all, isn’t that the modern reading of data journalism? Doesn’t making our working out and raw data more visible build trust as well as meaning?
Ironically perhaps, Pro-Publica highlights the problem in the submission for their data store project —
“Across the industry, the data we create as an input into our journalism has always been of great value, but after publication it typically remained locked up on the hard drives of our data journalists — of no use either to other journalists, or to anybody else who might find value in it.”
Publishing the data behind your project is what makes it open.
If you think I’m being picky, I’d point out that I’m not picking these at random. This is the shortlist for the open data category. These are what the judges (and the applicants) say are representative of open data. I think they could go further.
As I’ve noted before, if the practice of data journalism is to deliver on transparency and openness, then it needs to be part of that process. It needs to be open too. For me I’d like to see the “Publishing the data behind your project is a plus” changed for next year to an essential criteria.