Shared Practices in Museum Open Collections Data

Image Background: River View with Fishermen by Salomon von Ruisdael, Walters Art Museum.

Open digital collections support not only richness in digitized objects, but value in the data surrounding those objects. In our observation of a larger move toward open access in the context of museum collections, we may ask what information institutions choose to share about their open collections data and their common practices in doing so.

In considering what common characteristics open collections data policies may grow to develop in the future, we may begin answering this question by looking at current shared practices.

This article will take a closer look how a selection of museums communicate the availability of open collections data and find shared practices therein.

Walters Art Museum

The Walters Art Museum Policy on Digital Images of Collection Objects Usage outlines the use terms for collections images and data under open licenses. In reading this policy with attention to terms surrounding open collections data, the policy addresses the following pieces of information:

  1. Licensing Terms
“Because the Walters owns or has jurisdiction over the objects in its collection and owns or customarily obtains the rights to any imaging of its collection objects, it has adopted the Creative Commons Zero: No Rights Reserved or CC0 license to waive copyright and allow for unrestricted use of digital images and metadata by any person, for any purpose. The longer text descriptions about the artworks on this website are released under the GNU Free Documentation License.”

2. Direct Link to Collections API

The Museum of Modern Art (MoMa)

On the MoMa website, About the Collection leaves breadcrumbs to open collections data under a section titled “Research datasets” which describe the data available as well as their availability on GitHub.

Within the MoMa Github repository, the README functions as the guide to using MoMa open collections data, addressing:

  1. Scope
“This research dataset contains 130,262 records, representing all of the works that have been accessioned into MoMA’s collection and cataloged in our database. It includes basic metadata for each work, including title, artist, date made, medium, dimensions, and date acquired by the Museum. Some of these records have incomplete information and are noted as “not Curator Approved.”
The Artists dataset contains 15,091 records, representing all the artists who have work in MoMA’s collection and have been cataloged in our database. It includes basic metadata for each artist, including name, nationality, gender, birth year, death year, Wiki QID, and Getty ULAN ID”

2. Available Formats

“At this time, both datasets are available in CSV format, encoded in UTF-8. While UTF-8 is the standard for multilingual character encodings, it is not correctly interpreted by Excel on a Mac. Users of Excel on a Mac can convert the UTF-8 to UTF-16 so the file can be imported correctly. The datasets are also available in JSON.”

3. Licensing Terms

“This datasets are placed in the public domain using a CC0 License.”

4. Disclaimers

“Images are not included and are not part of the dataset. To license images of works of art in MoMA’s collection please contact Art Resource (North America) or Scala Archives (outside North America).”
[…]
“This data is provided “as is” for research purposes and you use this data at your own risk. Much of the information included in this dataset is not complete and has not been curatorially approved. MoMA offers the datasets as-is and makes no representations or warranties of any kind.”

5. Attribution Details and Digital Object Identifier (DOI)

“MoMA requests that you actively acknowledge and give attribution to MoMA wherever possible. If you use one or both of the datasets for a publication, please cite it using the digital object identifier [DOI 10.5281/zenodo.266290]. Attribution supports efforts to release other data. It also reduces the amount of “orphaned data,” helping retain links to authoritative sources.

6. Misc Usage Guidelines

“Do not mislead others or misrepresent the datasets or their source. You must not use MoMA’s trademarks or otherwise claim or imply that MoMA endorses you or your use of the dataset.
Whenever you transform, translate or otherwise modify the dataset, you must make it clear that the resulting information has been modified. If you enrich or otherwise modify the dataset, consider publishing the derived dataset without reuse restrictions.”

[…]

“Because these datasets are generated from our internal database, we do not accept pull requests. If you have identified errors or have extra information to share, please email us at collection@moma.org and we will forward to the appropriate department for review.”

Metropolitan Museum of Art

The Metropolitan Museum of Art’s Open Access Policy adopted this month outlines the availability of open collections data by addressing:

  1. Scope
“The Metropolitan Museum of Art creates, organizes, and disseminates a broad range of digital images and data that document the rich history of the Museum, its collection, exhibitions, events, people, and activities.”
[…]
“It also makes available data from the entire online collection―both works it believes to be in the public domain and those under copyright or other restrictions―including basic information such as title, artist, date, medium, and dimensions.”

2. Licensing Terms

“This data is available to all in accordance with the Creative Commons Zero (CC0) designation.”

[…]

“The Museum dedicates select data of artworks in its collection―both works it believes to be in the public domain and those under copyright or other restrictions―to the public domain. You can download, share, modify, and distribute the data for any purpose, including commercial and noncommercial use, free of charge and without requiring permission from the Museum.”

3. Available Formats

“The data is available as a Comma Separated Value (.CSV) file on GitHub, a web-based data repository and Internet hosting service. It is updated on a weekly basis.”

4. Direct Link to Data Repository

Cooper Hewitt

On the Cooper Hewitt website, Open Source at Cooper Hewitt outlines “the growing list of tools and resources that the museum has made available under a variety of liberal license conditions which are, where possible, global in application” — open collections data being one of those resources. Guidelines surrounding open collections data at Cooper Hewitt are summarized under section titled Open Data and Public API, reading:

“Collection data, excluding images, is released under Creative Commons Zero. It is available as a downloadable spreadsheet, as individual JSON files, and through our public API. Learn more about this in our Developers section.”

In this summary, Cooper Hewitt includes the following pieces of information:

  1. Licensing Terms

2. Available Formats

3. Direct Link to Collections API

While the summary above does not link directly to the Cooper Hewitt GitHub repository containing collections data specifically, it does include a link to the “Developers section” which does.

In addition to this summary included on the Cooper Hewitt website, the Cooper Hewitt GitHub repositories contain a collections data README which supports a more in-depth overview of available data and its terms for use. This README include the following pieces of information not already represented in the shorter summary represented on the Cooper Hewitt website:

  1. Scope
“Cooper Hewitt, Smithsonian Design Museum is committed to making its collection data available for public access. To date we have made public approximately 75% of the documented collection available online. Whilst we have a web interface for searching the collection, we are now also making the dataset available for free public download. By being able to see ‘everything’ at once, new connections and understandings may be able to be made. For more information please see our website.”

2. Instructional Wiki (Including pages for Creating Commons Licensing, Data Usage Guidelines, Tips on importing this dataset, Generating Persistent URLs, Objects, and Media)

3. Misc Usage Guidelines

“Following the lead of Europeana, we have also released some guidelines for use which suggest that users:
Give attribution to Cooper Hewitt, Smithsonian Design Museum.
Contribute back any modifications or improvements.
Do not mislead others or misrepresent the Metadata or its sources.
Be responsible.
Understand that they use the data at their own risk.”

Summary

In reviewing current institutional practices, we can begin to see themes in how terms for open collections data are articulated by museum institutions and how these themes are applied in practice.

The most common pieces of information used to indicate the availability of open collections data within this selection of four institutions included:

  • Licensing Terms: 4/4 Institutions. Of these 4/4 institutions, all share collections data under a CC0 license.
  • Scope: 3/4 Institutions
  • Available Formats: 3/4 Institutions
  • Direct Link to Data Repository: 2/4 Institutions.
  • Direct Link to Collections API: 2/4 Institutions

In reviewing this selection of museum open collections data practices and how those practices are communicated to users, it should be noted that only 2/4 institutions maintain an open access policy (Walters Art Museum, Metropolitan Museum of Art), while all four institutions function as strong examples for how museums are sharing open collections data.

As institutions continue in a larger move toward openness, looking at current practices can tell us more about how openness is communicated. By reviewing the practices of The Walters Art Museum, Museum of Modern Art (MoMa), Metropolitan Museum of Art, and Cooper Hewitt, we can identify themes in how museums are communicating the availability of open collections data, and what elements are at the center of that communication.