Culture and open data: where are we at?

Have you heard someone in your institution say the words « open data » recently? Chances are you probably have. It’s the new shiny thing that pops up in meetings, along with gleaming promises of reaching new audiences and ‘hacking the museum’. That all sounds great, but a bit confusing. What does open data really mean in a cultural institution? How does it even work? And what can we actually do with it?

This post is the follow-up to a workshop about culture and open data. As I started researching for the workshop, I realized how little I really knew about the topic. Now I know a little more, I thought it could be helpful to share what I’ve learned along the way. This article is in no way a comprehensive guide — think of it as a short introduction to the topic, which aims to present some the main stakeholders and applications of open data in the cultural sector.

What’s open data again?

Now that introductions are out of the way, let’s get to it: what is open data, and what’s it got to do with culture? Wikipedia defines it as ‘the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control’. In cultural terms, open data is like a huge library which would be free to enter, and where anyone could come read books, study them, and take a copy home. I really like the library analogy because it truly shows how relevant open data is to cultural institutions and how deeply rooted it is in their mission. The main purpose of Galleries, Libraries, Archives and Museums (GLAMs for short) is to preserve, educate and share knowledge as widely as possible — and in that sense, open data is a very natural tool for them to use.

Despite the many challenges and questions it brings, open data can be hugely beneficial for GLAMs. By sharing their collections and other datasets more widely and uploading them onto new platforms, cultural institutions can reach new and wider audiences, facilitate and enrich research, and even improve on-site visiting experience.

Open data and collections : a complicated love story

When talking about open-data in the GLAMs, the first thing that comes to mind are online collections. They actually represent about 80% (if not more) of the work done with open data in the sector, and this is where results are really visible. The aim is simple: it is to make all objects in all collections available to everyone, everywhere, for free. There are different ways to do this, which are not mutually exclusive: through an institution’s own website, or through partners like the Google Cultural Institute and Wikimedia. Although there are some major challenges to this (mainly the lack of author rights and the difficulty of setting up solid platforms), some great examples have been created over the years.

Starting with institution websites, the Rijksstudio is the classic example of an online collection done well. In 2012, the Rijks (who was nearing the end of a 10 year closure for renovation), released a large part of its collection under the CC0 licence, effectively passing these items into the public domain. This was a huge step and set a precedent for museums around the world to follow — however, they didn’t stop there. The team also created a wonderfully interactive website encouraging the public to play around with images from the Museum’s collection, helping them make lists and save images to their own devices. The museum also launched the Rijksstudio awards, which gave a yearly prize for the best creative use of objects in the collection, therefore encouraging users to play around with objects from the collection. Past winners include a lamp modelled from a bird print, eye contacts inspired by Flemish porcelain, and sleeping masks using portraits from the collection. For the Rijks, all these efforts had a very tangible goal: after such a long closure, the museum needed to reconnect with its audience and to remind visitors around the world of its relevance.

The Met is another pioneer in the field of open-data. Following in the Rijksmuseum’s footsteps, the Metropolitan Museum released a huge part of its collection under the CC0 licence in 2017, offering free access to these objects through its online platform. In order to reach an even wider audience, the Met’s team then created partnerships with the Google Cultural Institute and the Wikimedia Foundation, publishing objects from the collection onto new platforms for new audiences to interact with. Recently, the team also developed an API to let anyone freely interact with their collection and integrate it to other websites or apps for free.

Another interesting example of opening up collections is the development of 3D scanning and printing. Some institutions like the Fitzwilliam museum in the UK have been scanning (and encouraging others to scan) objects from their collection, and releasing the designs on platforms such as Sketchfab. These models are being used for education and research purposes, helping people study the objects from up close and access objects they might otherwise never see.

These are great examples, but they don’t represent the reality of the sector. Most GLAMs can’t afford to have one person dedicated solely to their online collections — let alone entire teams like the Met’s or the Rijks’. These platforms are not just great products built by great teams: they also show an institutional shift and acceptance of open data at all levels the institution. Opening collections, sharing the data and maintaining the platform is a huge investment, and not all institutions are willing to make the leap into open data.

Wiki Loves GLAMs

I didn’t make this title up — this is actually the name of a significant movement in the GLAM sector, which aims to create partnerships between GLAMs and Wikimedia. Wikimedia is made up of several entities: Wikipedia (which is where you read the articles) draws on several other entities such as Wikidata (where you upload data), WikiSource (where you upload text) or WikiCommons (where you upload images).

Wikimedia is used by hundreds of millions of users every day, and has a solid and reliable platform. It therefore provides a great alternative to institutional websites to share information about collections, and ensures the digital collection is collectively cared for. One of Wikimedia’s greatest strengths is its huge community of supporters, who often voluntarily help institutions digitize their collections and upload their data onto Wikimedia. In 2010, the Museum d’Histoire Naturelle de Toulouse (France) launched a digitization campaign where volunteers helped the Museum upload its collections onto Wikidata, Wikisource and WikiCommons, enriching articles regarding the museum and its collections along the way. Throughout the years, there have been numerous other examples of wikipedians coming together to support museums and institutions, sometimes independently — the Wiki Loves Monuments campaign was for instance created by a group of wikipedians to take visual records of local monuments and enrich Wikipedia articles with these photos. For institutions willing to upload part of their collection on the platform, the Wiki Loves GLAM community has published a very clear and useful set of guidelines relating to metadata structure and mass import. Some institutions have even chosen to have a Wikipedian in residence, whose role it is to help institutions or communities digitize and share their objects.

The second main argument for increased collaboration between GLAMs and Wikimedia is the Linked Open Data movement, or LOD for short. The LOD means that any piece of information which is uploaded onto Wikimedia can potentially be linked to any other piece of information already on the platform. For instance, someone researching Van Gogh’s Starry Night on Wikipedia would also find information about his other paintings, his letters to his brother, or his life in Aix en Provence along with high-definition reproductions of these letters and paintings. The fact that a growing number of institutions have passed part of their collections under CC0 licenses and uploaded objects on Wikimedia shows that this is slowly becoming a reality — see the screenshot of the Starry Night’s Wikipedia page below.

Finally, some institutions have joined forces with the Edit-a-thon movement all over the world. Edit-a-thons are events in which users are invited to come and edit Wikipedia articles, sometimes learning how to do so in the process. These events often aim to adjust narratives and overcome existing bias in Wikipedia articles — for instance, the MoMA or Lafayette Anticipations regularly host Art + Feminism Editathon where participants write articles about under-represented women and enrich archives related to female historical figures or artists.

Despite the benefits of the Wiki + GLAM collaboration, there are still barriers in place. For one thing, a large number of GLAMs don’t hold the right over their entire collection, and CC0, BY or SA licences (which are other types of permissive licenses required to upload content onto Wikimedia) can only be granted for objects institutions have right for. The question of author rights makes it difficult for museums and galleries with collections of modern or contemporary art to join the Linked Open Data movement. Moreover, even for institutions who own rights to their collection, digitizing collections takes time and resources with no guarantee of any return on investment.

What about the rest?

Of course, museums also have data outside of their collections. Lots of data. Lots and lots and lots of data. Think footfall, exhibition ticket sales, wifi, audio-guide, geo-spatial data, event-related data… unless they are deemed sensitive by the institution, all these data sets have the potential to be shared and used by citizens. This use of data has slowly started, but examples of successful use or practical applications are still hard to come by.

In France, the ‘open data by default’ law from 2016 urging administrations and public institutions to open up all non-sensitive datasets in their possession has led to the data.culture.gouv platform, which features datasets regarding public museums and cultural institutions. It’s for instance possible to find compiled visitor numbers, lists of public libraries, or information about historical buildings. Recently, data made available by the institutions was used by the organizers of Journées du Patrimoine to compile opening times of participating institutions. The Europeana portal has also been hosting open datasets from cultural institutions and has been used in projects relating to maps or bibliographies.

If more institution data was released, it’s not difficult to imagine what could be possible. The data could be used to centralize information about cultural institutions; to make access-related information more easily accessible; to build prediction models for attendance; to monitor where upkeep is needed, and coordinate fundraising efforts; to understand how people interact with collections and improve education material… As the growth of the whole Museomix and « hacking the museum » movement shows, members of the public are keen to get involved and to help museums come up with creative solutions to the issues they’re facing.

So, what’s next?

The potential for open data in cultural institutions is huge. In addition to fostering research and enriching knowledge bases, it could help re-shape the way that audiences get involved with their local institutions and the way that institutions themselves communicate with each other. There is a lot do to, but perhaps that’s what’s blocking us — with seemingly endless possibilities, it’s difficult to know where to start and how to allocate resources. Moreover, there is an undeniable financial barrier to open data: despite its many positive aspects, it is still difficult to demonstrate direct revenue generation from opening up collections. This creates what Samuel Donvil calls the ‘Catch 22 of open data’: institutions have a hard time investing in open data unless they see the benefits it can bring, but in order to bring benefits, investments need to be made.

The good thing is, you’re not alone! Whether it’s local communities or institutional partners like the Google Cultural Institute, the Wikimedia Foundation or Europeana you can find support outside of your organization. One painting and one dataset at a time, we’ll get there.