Sharing with a Purpose
There are some brief remarks for a meeting I attended at the Getty about the future of their Provenance Index. I was asked to speak about data modeling in the context of Linked Data.
I’ve invented a new Batman villain. His name is “The Modeller” and his scheme is to model Gotham city entirely accurately in a way that is of no practical value to anybody. He has an OWL which sits on his shoulder which has the power to absorb huge amounts of time and energy.
In the first issue, “Batman vs the Modeller”, the modeller gets away by confusing batman as to exactly which incarnation he currently is (Frank Miller, Golden age or Batman Begins) which forces Batman into an identity crisis where he registers different URIs and FOAF profiles for Batman and Bruce Wayne.
When we talk about Linked Data we often spend a great deal of time talking about vocabularies and modeling decisions. One of the real benefits of RDF (in all its serializations) is that it provides an open world for us to use shared vocabularies like Dublin Core, SKOS, PROV, CIDOC-CRM, schema.org, etc to describe our stuff. These vocabularies can often be used together; and once you know a few things you can easily create your own to fill in their gaps. The Modeller parody works so well because it’s true — much time and energy is spent on this step. So much time sometimes that we forget why we started modeling in the first place.
First and foremost (for me) Linked Data is a Web centric way of sharing data about content you or your organization care about. Identifying resources with URLs facilitates sharing your data assets with the outside world. If you work in a large enough organization (like the Getty) identifying things with URLs is surprisingly useful within the enterprise as well.
Unlike many commercial enterprises, cultural heritage organizations like to share their data. Sometimes licensing and concerns about data quality conspire to prevent it. But generally speaking libraries, museums and archives collect stuff they think is important, and want others to come and see it. I guess it’s why we obsess over about the best ways to describe it.
When I look at successful projects that use linked data like Europeana, DPLA, VIVO, schema.org the thing I see isn’t so much their use of vocabulary, as it is their desire to share. But they aren't simply sharing, they are sharing with a purpose. Their data modeling choices are actually a part of a larger plan to share data in a meaningful way, to feed a particular set of applications that are oriented around experiences they want people to have with their content.
When reading the discussion paper for our meeting today I was struck by a particular statement:
Our goal is to build a new system that allows users to navigate data within and beyond the Provenance Index in an interest-driven way, comparable to surfing the Web.
What is it like to surf the Web? Much like the ocean that we surf, the Web is an incomprehensibly large and constantly evolving space. After 25 years the basic building blocks of the Web (HTML, URL and HTTP) have shown remarkable resilience and adaptivity. The Web is a system that we are increasingly dependent on. Just like the ocean, if the Internet fails there’s no backup. All you have to do is read the news about how nation states are abusing (instead of caring for) the Internet, to see that it’s failure may not be that far fetched. I’m not sure I would’ve said that five years ago. Because of growing awareness about issues of persistence, privacy, security and accessibility the Web may very well be on the cusp of a new era of innovation.
As you consider how to make data assets like the Provenance Index available on the Web I think you should strongly consider how this effort furthers and extends the Getty’s historic mission, and the mission of the Web. People look to the Getty as a center of excellence when it comes to information about the visual arts. As more and more of the visual arts moves onto the Web the Getty is uniquely positioned to connect, legitimate and help sustain this content.
The Web works best when it has no center — no single point of failure. I wonder how can the Provenance Index can be a place for tracking resources in DPLA, Europeana, Wikimedia Commons and for linking to places like the Virtual International Authority File or Wikidata? More importantly how can it be a hub for researchers to hang their own data off of.
If you’ve been following John Resig’s recent work with the Frick you will know that he’s been doing some really interesting work applying computer vision, Web harvesting and his remarkable design sensibility to the study of artistic provenance. If you haven’t watched his talk at OpenViz about his work with Japanese woodcuts definitely give it a watch, and prepare to have your mind blown. The short story is that the deployment of art collections and their metadata onto the Web, combined with advances in computer vision and databases have made it possible to find previously unknown connections within and between museum collections.
As more people like Resig and yourselves build these datasets of relations is it feasible to think how their work can augment, overlay or become a part of Getty datasets like the Provenance Index? Could those data assets live externally on the Web, but become selectively published, aggregated or archived by the Getty? What do the mechanics of sharing changes about these datasets look like? How would researchers discover them, and interact them, and wander outwards to the larger Web and back again?
This is a unique opportunity to think big, not just about your historic collections and how best to model them but about the collections that are right under our noses today on the Web and how best to sustain them. To paraphrase Aquinas and my former colleague Dan Krech who so often quoted him:
The past is short and the future is long, and present determines how short and long they are.
I hope the future of the Web is long too. But I think it needs your help.