Avoiding the open data pitfalls

Craig Mills
Digital products for non-profits
4 min readDec 22, 2015

--

The new IUCN Red List was released a couple of days ago, full of good news (Iberian Lynx doing a bit better) and bad (Lions are in trouble and many species are disappearing). With this in mind, I thought I’d create a visualisation using their data.

I’d read an article about human deaths by animals and thought I’d try and see if their distribution is linked with the location of Red Listed animals. Sounds light hearted right! I could see an interesting story around the idea that a snake species was in decline in the same areas they were killing humans. Or maybe there was an odd pattern where people were being killed where the species didn’t naturally occur. Perhaps it was the pets killing their owners. I was going to use some data to try and find out. If a story did emerge, it would offer a lighter way into understanding the importance of the Red List, driving a new type of user to the issue and increasing the impact of the work. Maybe some of those users would dig deeper, learn more, become engaged.

The problem is I couldn’t access the data without jumping through hoops or breaching their terms. So this post is now going to be about that. Not a bashing of IUCN — they do great work — but an alternative to their current model of downloading the data. This is not just about the Red List website; this could equally apply to many NGO data initiatives out there.

So let’s start with a commonly held assumption of making data available for download on a site like iucnredlist.org. The more people that download and use the data, the more impact they have. Right? I think so.

Logic then suggests that anything done to make it harder to download those data will reduce the impact.

I’m going to try and look at the common reasons for making it harder, and see if we can find alternatives.

1. Agreeing to terms and conditions

On previous data portals we have worked on, adding a terms and conditions pop-up prior to a file downloading will reduce your impact by roughly 20%. So why are the T&Cs there? Avoiding litigation, applying a non-commercial restriction, retaining ownership and IP, all sorts of reasons.

A Solution? Be brave, silence the lawyers, add a licence file to the download if you need to, put a link to the terms of use next to the file download but don’t scare away those 20% of users for something with miniscule risk of happening. Those 20% might go onto do work to protect endangered species. That’s surely worth the risk?

2. Registering your details before you can download.

When I tried to download the Redlist data, I couldn’t unless I provided all my details in a form, including my phone number! I’ve seen from other data sites I have been involved with that the number of downloads goes down, by as much as 60%, when you ask people to register — that’s over half your users gone!!

Often the reason cited for this is to better understand users, to show impact to donors (“Hey 300,000 scientists have downloaded the data for research…”). The other reason is to track commercial use.

Both reasonable reasons. If the reason is the former, there is a better way.

Ask later for the details. Create a questionnaire, put it on the website and ask people in detail how they use the data once they’ve actually got to the data and played around with it. Ask them if you can follow up with a phone call interview to get even more insights into how the data are used. Real user quotes are much better for an annual report. This way, you still get the information you need without reducing your impact. And let’s be honest, who looks at all those registration details or does anything valuable with the information. I would bet on it being very low.

Now on to the thorny issue of commercial licensing…

3. Restricting users for commercial purposes.

The argument goes something like this…

It costs a lot of money to run something like the RedList; many millions per year. Companies using the Red List data will profit many more millions when they utilize the data. Therefore they should pay for it.

It’s reasonable logic. But I would come back to these two points

“The more people that download and use the data, the more impact there is”

and

“anything done to make it harder to download those data will reduce the impact”

I think making it hard for the private sector to use these global conservation datasets, regardless of the reason, is daft. From a visualisation studio like Vizzuality, who can help promote the cause to those multinationals that are drastically changing large swaths of land and sea, the benefit to having free access to the data are massive.

And this should be the starting point. First, make it free for anyone to access: carve this principle in stone. Present the idea as a public good to all humankind and fundraise around that. It’s compelling and would make a donor’s heart swell, whether they’re governments, philanthropists, the GEF, large corporations, foundations, kickstarter, whoever it may be. This new rhetoric would change the shape of the offer and open up new approaches to fundraising. It will take a leap of faith to be bold and put the impact ahead of a funding stream. But who said we can save species by being timid.

[originally posted here: http://blog.vizzuality.com/post/122340435976/opening-data-to-increase-impact ]

photo: https://flic.kr/p/krajTL

--

--