Open data and APIs — fuelling innovation

I spoke at the APIDays 2016 conference in Auckland recently. Here’s what I said.

Today, I’ll be taking you through some of my thoughts on open data and open APIs. I’ll give you a use case of how they fuel awesomeness by telling you a bit about a big open data hackathon called GovHack, and what I’ve observed and heard.

And I’ll be talking about why I believe we need more open data, especially in the form of open APIs, and how it’s good for all of us (including business!).

Open say what now?

Likely because of my shared background in science and language, I believe that strong, shared definitions of any term are amazingly useful when talking about it.

So first up, it’s probably useful to make sure we’re all on the same page with what I mean when I talk about open data, and open APIs.

(Innovation’s a fluffy term, but we’ll get to that bit later.)

For the purposes of this talk, we can define open data as any data that’s publicly available, and on an ongoing basis.

“Publicly-available” doesn’t mean for a short period of time only. It doesn’t mean for a select group of people. Or with barriers to entry like being charged to access it.

It means, quite literally, publicly available. To anyone, anywhere, anytime.

Of course, this definition also includes shoddily-made PDFs of council minutes, for example, so the data may be publicly available, but that doesn’t mean it’s hugely accessible, or even terribly useful for many use-cases (not without an amount of resource which acts in itself as a barrier to entry).

And it’s not necessarily live data. To put it another way — it’s static, and while it’s still useful to look for past trends and patterns, it’s not useful for any real-time application. It also, very easily, becomes out of date. And one doesn’t necessarily know when that happens.

One way to counter this, of course, is to make it machine readable. This can be done by the organization who’s releasing the data, or by members of the public or other organisations. To use our earlier example of Council minutes, someone might take those shoddy (and often rather buried) PDFs and turn attendance records trapped in them into a spreadsheet, for easier analysis.

An even better thing to do would be to also release the data in the form of an API. An open API — that is, one that’s publicly available. With a bit of luck, there’d also be some accompanying documentation about how to use the API, making it nicely accessible, too.

This allows people to analyse and pipe the information into just about any form they choose, from maps to apps. Not only is the data the most up-to-date available, but, as we know, it can be used to make realtime, living resources.

And we all know that devs far prefer an API to being handed an Excel spreadsheet (which may or may not have become corrupted along the line, too).

APIs in the wild: GovHack

We’ve all seen this preference in numerous situations. One I can particularly speak to is GovHack.

GovHack is the southern hemisphere’s largest open data / open government hackathon.

It started as a small civic hacking slash protest project in Australia in 2009. People there were…irritated…that some of them were being threatened with legal action, BY government departments, for using public government data.

It’s grown somewhat since then; government departments are intimately involved with it and, in some cases, actively on board!

Last year, GovHack spread to NZ as well. And every year it gets bigger, both here and in Aussie.

GovHack Wellington 2016 (and national winner) team Two Cats and Some Mice, of the project Kiwibubble. Credit: Mike Riversdale

This year, GovHack NZ happened in 9 locations, and 22 in Australia.

In total, there were over 2,500 participants. Of the 429 projects that competed for the raft of local, national and international prizes, 65 were in NZ.

There’s much to be proud of there, of course. People tackled a very wide range of subjects, from lost dogs to climate change to fracturing societies.

And they did so with a tonne of data. 1,582 datasets, in fact. And not all government, either.

Of these, a fair number were APIs (I ran out of time to manually check and count them all — my apologies).

What I can absolutely tell you, anecdotally (haha, yes, I know anecdotes aren’t data)? is that those datasets that came in the form of APIs were particularly popular.

And often more trusted. There’s an idea that data served through an API is more…trustworthy, in some respects. That it’s more likely to stay up, at least for a while.

Of course, this brings me to the idea of the social contract behind APIs. I know there’s a lot of conversation around this in API circles, so I won’t bang on about it, but…

If one wants people to use one’s APIs, then one shouldn’t just take them down. Especially not without consulting with and warning one’s users first. Not only is it simply a not-very-nice thing to do, but it’s counter-productive.

If people are building a business, or a service, or anything really, they need to have some trust in the continuity of the substrate. The data. The API.

Or people will stop using the API. Which very neatly destroys the value of them in the first place….


I hear a lot of puzzlement from people about why on earth one should GIVE away information. After all, data leads to information, which leads to knowledge.

And knowledge is power, right?

I’ll answer that question from a couple of perspectives — societal, and commercial.

The societal value of open data

There’s a strong case to be made that data that’s generated by publicly-funded institutions — eg government — should be available to the public who paid for it. Of course, I’m not talking about all data, and certainly not in raw form. We must be very careful around privacy, and not victimising or revictimising people.

But there’s plenty of data that can and should be shared. And a tonne of organisations, and research, backing up that open government data is a powerful thing.

It enables ordinary citizens to understand better what’s going on in their societies, and make more informed choices about the decisions they, and their elected officials, make.

It enables people to push for change — or the status quo — based on evidence, not anecdote, opinion or emotion.

It builds higher levels of digital literacy — since people need to understand the basics (or know people who do) in order to use the data, they have a stronger incentive to learn about it.

GovHack’s a great example of this. We’re consistently working to achieve higher levels of diversity amongst the participants (in every way) — I’ve had more than one person come up to me and say “I didn’t think these events were for me. I didn’t see what I could contribute. Now I do! And next time I have a question about my society, or see a problem around me, I feel like I can do something about it. I know that I can look for data around it, and tackle it either on my own or with other people. And I want to keep learning, and creating.”

It builds more transparent government and power structures, enabling people to hold their governments to account (sometimes to the horror of said governments, which is great).

And it gets people engaged. Fewer black boxes into which data disappears, never to reappear, means a less cynical and disengaged society. And the more people are engaged in their society, the better. After all, we live in a democracy!

And we get to interrogate god.

The commercial value of open data

I realise that these benefits don’t appeal to everyone. And however much one does — or doesn’t — subscribe to them, there’s also the very real fact that everything costs money.

Making one’s data machine readable costs money. Hosting data costs money. The development and maintenance of data in the form of APIs…costs money.

And, especially in business, one needs to take this into account.

Thankfully, there are strong commercial benefits to open data. Both as consumers and as providers of it.

Open data, and open APIs, mean that people can build businesses using that data. There’s a talk tomorrow, in fact, building businesses on government APIs!

And not only is this data available — meaning there’s the possibility of building a business on it — but it’s often free. Which is a very happily empty line item for any business and their accountants.

There’s also benefit as a provider, though, including for private companies.

Opening up and serving data gives you control of the narrative. You may not necessarily have much say over how it’s used, but you can build powerful stories about why you’re giving back, and how. We live in a world where, increasingly, consumers are looking to organisations to be good actors in the system.

In the decisions you make around the technology stack and language you use to serve that data, you can have a very direct effect on the state of art, and how it evolves (some of the people in the room will remember the VHS/Betamax fight, for example).

It’s also worth remembering, especially in larger or more complex organisations, that making data open allows the organisation itself to benefit from that data. Otherwise it’s trapped on desktops or drives, in random project folders, in someone’s head, or hitchhiker’s guide to the galaxy-level information management systems.

It’s certainly not accessible, or useful, in any true meaning of the word.

Of course, you can affect what your competitors do. No one wants to be the last person in the room to do something. Nobody wants to be known as “those arseholes over there who aren’t coming to the party”. Especially by their customers, but also by their employees, colleagues and collaborators.

In turn, you can keep a closer eye on them, if you choose to do that sort of thing. You can learn from them, or compete with them. You have more ways to make your point of difference clear. Perhaps you might even collaborate with them

Commercial organisations already do this all the time. After all, sending one’s people to conferences to speak and network is, at the basis of it, open data lite. And served out as a human-readable, human APIs.

And there’s a final, obvious one. If we all take, but no one gives, then pretty soon there’s not much left to take… It’s only prudent to look after one’s ecology.

One of my favourite practical examples is in the area of patents (I’m not a patent lawyer, of course).

Patents benefit their holders, its true. But the progress levels rise sharply when patents lapse. Why? Because a far wider range of people has the chance to work with the information — to test it, improve it, iterate it, run with it, or build something completely new off it.

Another great example, and the obvious one I’m sure many of you have thinking of, is the open source software movement.

Smart organisations realised a long time ago that holding onto the code isn’t where one derives the best value. It’s the services one provides on TOP of that code that’s the powerful thing. Opening up the code allows a much larger group of people to work on it, and improve it.

And because anyone can use it, it’s far easier for that code to become widely used — I’m thinking here of the linux systems which underpin so much of our modern digital infrastructure, for example.

Open drives progress, not closed.

Data that’s locked away is lonely, and not really fulfilling its true purpose.

It’s like a single book in a library, accessible to only one person…


So I’d like to posit a way to look at the API ecosystem. As a library. Not the ones devs normally talk about.

I mean the meatspace ones, with books and magazines and graphic novels and people from all walks of life in them.

Libraries — especially public libraries — are the original shared information service.

They share a common language — not only in how the information is organized, but quite literally in language (of course, larger ones will also have spaces for other languages).

They present information in a range of formats. One isn’t tied to hardcover paper books. There are softcovers. There are picture books, and text-dense tomes. There are magazines. There are DVDs, and CDs. Increasingly, there are e-books. There’s fiction, and non-fiction.

There’s a huge range of subject matter.

They’re generally available. Because they’re staffed by humans, sadly they can’t generally be open all the time, but the knowledge they share is, if you use it (ie loaning books). And, generally, one can be sure they’ll be there tomorrow, and the day after, and will still have lots of amazing information to share.

There are services onhand (in the form of librarians, but also catalogues and other discovery engines) to help one find and digest the information.

There’s a long tail of use — from people who use them heavily, to people who pop in occasionally.

They’re (generally) free, and they’re pretty easy to find.

They’re a powerful connecting node for research, community, collaboration and creation. They bring together a wide range of people with different skills, interests, and backgrounds.

And the point? All of this does one, simple thing: it drives innovation. Not because they pay for themselves directly, or because they benefit their owners and funders directly.

Because they provide access to a vast, shared pool of knowledge. A much larger one than any one person could gather, or use. Their costs are shared between us all, but so are their (infinitely greater) benefits.

We can’t know that having this set of books in them will lead to this advance. Or that this particular group of people will do something amazing with the books. We can be certain that they’ll bring together the people, and the knowledge, that act as fertile ground for small and great advances.

Just like the internet.

Just like hackathons.

Just like this conference

And, of course, just like open data, and open APIs.

No database (should be) an island

I believe that de-escalating data as a source of power is a great and desired thing. As with open source software, the power should be in what one does with it, not keeping it in the proverbial locked safe.

There are umpteen cases studies throughout history of the enormous value generated when information is shared, remixed and built upon. Civilisation itself is one of them.

Open data opens the door to opportunities. For your business. For society in general. For people from all walks of life.

It’s an equalizer, and a potent source for good.

It’s also a very natural partner to the open source movement. Together, open data and open source can build open government and industry.

And open societies. More transparent. More free. More informed. More engaged. More truly powerful, as ways to advance human potential.

A better world.

Information wants to be free. You have the keys.

Now open the door.

Slides and videos of the other APIDays talks will be up on the website — there were some fabulous people and topics there :)

Slides themselves at link below.

— — —

GovHack 2017 is happening late July 2017. Want to get involved in some way? Get in touch! We’re looking for organisers, sponsors, participants and anyone else who wants in :)

— — —

StatsNZ has just released a number of new prototype/experimental APIs. They’d love to hear back from people on how to improve them before they release more.

— — —

Check out Open Data NZ’s (brand new!) video on open data and its potential.