Open for who?

I bracketed September by attending two open data conferences: one for scientists working with satellites, and one for librarians and archivists. Sitting in the audience at both events, it occurred to me over and over again that we’re not all talking about the same thing when we say ‘open data’.

Indeed there seemed to be a kind of fractal misunderstanding about what the word ‘open’ means, or more specifically who the ‘open’ is for. Earth observers had a different idea than librarians: while scientists seemed to be focused on making their projects open to other scientists, the library people were mostly considering humanities researchers. But even within those groups there didn’t seem to be common ground; everyone who worked at the Library of Congress didn’t agree about who the audience for open was, nor did everyone in one department at the ESA. Honestly I’d be surprised if any two people at either of those events could agree on who they were being funded to be open their data to.

I’m still Canadian enough to believe that open means open. I’m with Open Knowledge International when they say that:

“Open means anyone can freely access, use, modify, and share for any purpose.”

Under this definition I’d argue that very few of our so-called open data projects are actually open, unless we manufacture a definition for ‘anyone’ which includes only people who look and think a lot like ourselves.

Let’s try an experiment. Pick an open data project, your own or someone else’s, and give it a score of zero.

Because we’re feeling charitable, let’s give the project one point just for the word ‘open’, assuming the data is accessible in some way, through an API or a file download or a carrier pigeon service. Next, give your project one additional point for each of these questions you can answer ‘yes’ to:

  • Does the project have comprehensible documentation, examples, and tutorials?
  • Are there materials (teaching curricula, blog posts, videos, etc.) that offer context around the data so that someone unfamiliar with the project can understand why it might be important?
  • Can a non-programmer access the data?
  • Is there documentation available in more than one language (ie. English + Spanish)?
  • Is your documentation and the site it is hosted on compatible with screen readers? Have you tested it?

How did you do? The three open data projects that we run at The OCR (Floodwatch, Into the Okavango & Elephant Atlas) scored 2 points, 2 points and 3 points respectively.

I think it should be a minimum goal for every data project that wants to legitimately use the term open to score at least a 3 on this test. But scoring a 3 is like scoring a C; it’s the minimum viable open. We’ve arrived at open-ish data, openesque data at best. How can we do better?

A lot of the answers are encoded in the questions above. Write understandable documentation, examples, and tutorials, and write them for an audience that isn’t you. Post interviews with good communicators who can give context and narrative. Provide easy-to-use visualization tools to foster comprehension.

Think about making your data human readable as well as machine readable. For example, at IntoTheOkavango.org, we’ve made out API returns available in two easy-to-digest views in addition to the standard JSON return . The intent is to provide a way for people to take advantage of the features of our API without having to read JSON.

To get to the outer reaches though, where our data is really and truly open to anyone, we need to consider exclusion and accessibility. Put more directly: we have to think about people who aren’t us.

Who is being excluded by the technologies that we are using and by the ways in which we are communicating? In writing this article I realized that while our conservation efforts along the Okavango River span three countries with three official languages, we only have API documentation in English. So we’re translating to Portuguese and Afrikaans. I also realized that all three of our supposedly open data projects are tedious (or impossible) to access with a screen reader, which we will be taking steps to remedy.

Finally, to make a data project truly open to anyone, we need to think about outreach, past the computer. There is a technocentricity to our platforms that is difficult to overcome: how do we make our data open to those with low levels (or no levels) of digital literacy?

One tactic might be to ensure that our data and documentation are friendly not just to researchers but also to propagators. Write tutorials specifically for for journalists. Work with teachers to develop curricula around your data, even if your target audience isn’t the classroom.

A final answer might be to take our data actually into the open. At The Office for Creative Research we’ve been exploring data performance and large-scale data sculpture, both as methodologies for putting data directly into contact with the individuals and communities who aren’t likely to use an API. These tactics can be particularly effective when trying to reach communities and demographics where data literacy is low.

If we take the time to closely examine open data efforts, it’s hard to avoid a disheartening conclusion: most open data isn’t. But, if we look at this truth directly and use it as critique to frame our next proposal or our next project, we may be able to move toward a place where data is in entirely, agreeably, open. To scientists, to researchers, to academics, to artists, to teachers, to students, to the generally curious and the loudly critical.

To anyone.

One last criterion for open: Can you fit an entire marching band underneath your data? Photo by Noa Younse.