Datasets are Books, Not Houses

Brendan O'Brien
qri.io
Published in
10 min readJan 10, 2020

The world of linked data is built on shaky foundations that prevent a true data commons from emerging. The problem isn’t with the data, but with the way data is linked. Specifically, the way links are addressed.

An address is a uniform, shorthand way of referring to things. Geographic addresses are an obvious example. 1600 Pennsylvania Ave. is the address of the White House. 268 Elizabeth St. is the address of my childhood home. These houses are different in size and function, but they are both locations with addresses that adhere to a (relatively) consistent system. Addresses make it easier to refer to places.

The internet today is location addressed. Youtube.com and Boingboing.net are locations. We refer to content like a video or a blog by its universal resource locator (URL). Much like my childhood house, the internet is organized around the location of content.

Location addressing works well for many purposes, but it’s a poor system for linked data for two important reasons: link rot and content drift. A rotten link is a location on the internet that has become permanently unavailable, the classic “404, not found” you see all the time. Link rot is pervasive on the internet, and in the context of data it’s deeply problematic, as it amounts to a missing dependency.

Content drift is arguably the more insidious problem. It’s completely possible that by now someone has demolished my childhood home and replaced it with a different house. In this case the address is the same, but the content has drifted, taking on a different meaning since I last visited. In the data context, the cognitive and procedural overhead of confirming that data exists and hasn’t changed dissuades us from taking on the challenge in the first place.

These problems have balkanized the open data landscape. Because it’s not possible to build a sufficient reliable system that spans across locations and services, data providers have very little incentive to depend on each other.

So, what’s the alternative?

Location addressing is so ingrained in our understanding that it may take a bit of “unlearning” to recognize that other, alternative addressing systems exist. One alternative system for books could be by title. Books have a title, author, publisher, and an ISBN number. The title of a book is a meaningful reference of what’s inside it. The Cat in The Hat is indeed a book about a cat, wearing a hat. Because of this, we…

Brendan O'Brien
qri.io
Editor for

Caretaker at @qri-io