MuseumPlus API: Importing 100 years of art history

Roman S
mutoco
Published in
5 min readFeb 2, 2022

By the end of 2021, mutoco completed the new website for the “Digilab” department of Kunsthaus Zürich. This website will provide a platform for new digital artworks. Also, for the first time ever, the entire exhibition history of the Kunsthaus Zürich will be public and online available. In this context, we are talking about approx. 1500 exhibitions over a period from 1910 to the present day, which can be visited by the website.

In this article, I will explain how a seemingly trivial problem could become quite a challenge.

The initial situation

Kunsthaus Zürich utilises the tool “Museum Plus” by zetcom to manage its collection. All artists, works and exhibitions are recorded in it. It was obvious from the start that it made no sense to maintain this data in several systems.

Therefore, we had to find a solution to migrate the list of exhibitions from “Museum Plus” into our new website. Since zetcom offers an XML API for their software, this should be easy.

Two possible approaches crystallised pretty quickly:

  1. Immediate access to the API, each time the website is visited.
  2. Periodical import of the data into a separate database.

It became quickly obvious that direct access would be much too slow. So the only option left to us was to import the data periodically and store it in our database.

First steps

We knew that we wanted to create the website with a headless CMS. The choice fell on SilverStripe because it is very flexible and has a high-performance GraphQL interface. In addition, the existing “Queued Jobs” addon provided us with an excellent foundation for the implementation of our own importer.

The first step was done quickly. We were able to load the data via the API and our rusty XML skills were once again in demand.

Test-driven development

Anyone who has ever developed an importer knows how problematic it is to access a live API during development while writing directly into the database. This approach is very time-consuming, unnecessarily stresses the live system, and creates entries in the database that you don’t want.

Therefore, we aimed for a test-driven development, where unit tests are always implemented in parallel regarding the functionality. This allowed us to solve certain problems isolated and to check them much faster compared to a live import.

Another advantage is, of course, that tests can also be used for future adaptations to verify that all components work as expected.

Data deluge

Something we only realised in the process is that we are talking about large amounts of data that have to be loaded. An exhibition is not just one record. Each exhibition has several relations to other data sets. For example, an exhibition includes:

  • Exhibitions
  • Artists
  • Permalinks
  • Texts
  • Media (e.g. exhibition poster, etc.)
  • Categorisations (e.g. art movements or techniques)

These data sets could in turn have further relations to other data sets. Another complication was that certain relations were only accessible via intermediate relations.

Relation diagram from exhibition to work of art

In the chart, you can see that a piece of an exhibition (“Werk”) is always linked to the exhibition record via a “Registrar” record.

Individual records are usually a few kilobytes in size, but some consist of numerous megabytes of XML data. In addition to the extensive XML data came the high-resolution TIFF files, which are often 30 megabytes or larger.

The result was that we quickly reached the limits of the available memory.

Optimisation

To optimise the whole import process, we have rewritten the importer several times. A first step was to parse the XML data as a stream and not to load it completely into the working memory. Much of the data is actually not relevant to us, so we extracted compact tree data structures from the XML records and stored them in a temporary SQLite database.

We have divided the entire import process into these individual steps (tasks):

  • Search: Search for records with certain criteria
  • Load: Load a single record
  • Import: Import the record into the database
  • Import Attachment: Import a related image file
  • Linking: Link records that belong together
  • Cleanup: Delete outdated records

The process always starts with a task that is added to a priority queue. Most tasks can add further tasks to the queue. For example, the “Search” task creates a “Load” task in the queue for data records that are found.

Using prioritisation and stepwise execution, a large task is processed bit by bit and thus requires a relatively small amount of memory. When importing the large image data, we automatically create smaller JPEG files to use fewer resources.

Reusability

From the very beginning, it was important for us to implement a solution that is as generic as possible, so that future adaptations or other applications are easily achievable.

We developed a separate open-source add-on for the SilverStripe CMS, which greatly simplifies the import from “Museum Plus”: https://github.com/mutoco/silverstripe-mplus-import

This add-on can be used to describe an import process via configuration files. More complex requirements can be carried out with “Extensions”. We have used this feature for the import of several languages, for example.

Conclusion

In many cases, it is difficult to estimate in advance how complex a task is. In this particular case, we certainly underestimated the task. However, the developed add-on stands the test in everyday use, and we are confident that it can be used for other projects as well.

Who knows, maybe someone can make use of our experience and the work we have done. We would be pleased.

--

--