Thousands of Exhausted Things, or why we dedicated MoMA’s collection data to the public domain
MoMA accessioned the Creative Commons License Symbol into its collection in March 2015 and it’s now on display in our design galleries as part of the exhibition This Is for Everyone: Design Experiments for the Common Good. According to curator Paola Antonelli (@curiousoctopus), Creative Commons allows those who create content to “think beyond the default position of All Rights Reserved.” It therefore feels important that we just flipped our own default and shared data for more than 125,000 works from MoMA’s collection on GitHub using Creative Commons Zero (CC0).
This data release includes all of the works that have been both accessioned into MoMA’s collection and cataloged in our database. It includes basic data for each work, including title, artist, date made, medium, dimensions, and date acquired by the Museum. The data will be updated periodically with new acquisitions and research.
MoMA’s open data is primarily intended to be useful to scholars, so it was important to make each version citable. Arfon Smith (@arfon), co-founder of the Zooniverse and a former collaborator of mine, is now leading GitHub’s engagement with the academic community. He shared this useful guide to producing citable code on GitHub using Zenodo. A Digital Object Identifier (DOI) is automatically created for every MoMA data release, and the data is also archived to the cloud infrastructure used by CERN’s Large Hadron Collider.
While releasing this data with “No Rights Reserved” was a significant milestone for MoMA, a bigger cultural shift lies behind the records that are marked “not curator approved.” More than half of the records included in this data release have incomplete information and may contain errors. There is established evidence that researchers want online access to collection records as quickly as possible, “whatever the perceived imperfections or gaps in the records.” We therefore decided that we would share this work in progress in order to provide a more comprehensive view of MoMA’s collection.
The scope of the data release was informed, in part, by individual research requests received by MoMA’s Library and Archives. It was also influenced by the Artists Experiment residency of The Office for Creative Research (@the_o_c_r), a multidisciplinary group exploring new modes of engagement with data. Previous projects by OCR include installations at The New York Times, Public Theater, and London Science Museum, and work on the algorithm for the 9/11 Memorial. OCR was tenacious in pursuing data for their residency, explaining the importance to their practice of seeing everything, rather than simply working with the 55,000 “good” collection records that were already available to the public on MoMA’s website.
OCR eventually used a version of the collection data that we’re now releasing to everyone to create a live performance in the galleries, A Sort of Joy (Thousands of Exhausted Things).
OCR has documented the project in a blog post where they also announce that they’ll be releasing the collection interface used to generate the script for the performance. According to OCR, “[MoMA’s] data can be and should be terrain for exploration, forum for interrogation, and substrate for creation. There is prose and poetry and performance to be made from these rows and columns.”
We’re really excited to see what you make of — and with — MoMA’s collection data.
Thanks to the Cooper-Hewitt and Tate for paving the way by releasing their own collection data on GitHub using CC0. Thanks also to George Oates (@goodformand) for reassuring us that a CSV is not just the easiest way to start but probably the most accessible format for a broad audience of researchers, artists, and designers.