Breaking away from TMS

A visit to the Barnes means discovering unexpected and exciting contrasts between works from different eras, places and cultures. At Girlfriends Labs, we want to bring that same experience to the collection online. We’re building a tool that imitates Barnes’s unique way of seeing art, through the visual relationships between pieces (You can read more about Albert Barnes’s approach to curation in our first post on this project).

Before I dive into TMS and our backend architecture, let’s look at some art.

I think art is very healing. Don’t you?

Backend Requirements

Suppose we want know which oil paintings from French makers appear on west-facing walls of the Barnes Museum. If that’s what it takes to build the online collection, then we need a datastore that can process complex queries. Also, we want to extract information about the visual relationships between artworks in the collection. Therefore, we need to build an extensible system, one to which we can add image processing modules as we go. There’s an added constraint as well: we need to do all of this without upsetting TMS, the system that people at the museum are using right now to manage the collection (more on that in a minute).

We decided that the best way to have the tools that we wanted given our constraints was to write a collection of scripts that would automatically pull data from TMS into Elasticsearch, organized so that we could easily add image processing later as we worked out how to do it. We’ll be talking about some of these components in a bit more detail in upcoming blog posts, but for now here’s a high level view of all of the components and how they fit together.

Fitting it all together

Emigrating from TMS

The Museum System (TMS) is the world’s leading collection management software. Museums all over the world use it to store their collections, exhibitions and more. The Barnes Foundation has been using TMS for years to hold its collection data, and so you might well ask, why do all this work to move the data to Elasticsearch, when it’s already sitting in TMS ready to be used? Well, for one, TMS won’t let us execute structured queries. If we wanted to know which French oil paintings are on a west-facing wall, it’s just not possible to access that information with TMS alone.

However, even if TMS did support structured queries, we’d still want to find a way to move away from it. We’ve already encountered our share of unpleasant quirks (we lost about a day learning that TMS seems to store text not in utf-8, not in cp-1252, but in cp-1252 encoded utf-8), and we’ve found that people who work with TMS on a daily basis have had even worse experiences. Changing TMS is clunky and slow—after you’ve added new collection objects to TMS you can’t actually look at your data until it’s finished a mysterious process known as “re-indexing.” Also, while TMS purports to have an API, it’s only available after purchasing the whole eMuseum product suite. That’s a lot to ask if all you want to do is use the TMS API.

The core problem with TMS is that makes it hard to see our own data. Which is exactly the opposite of what we want from our datastore.

Pulling from the API

In order to pull data from TMS, we wrote a Node.js script to automatically iterate through the collection and pull JSON data for each object. If you look at our repository, you’ll see in config/export.json we specify the URL that is currently serving our TMS API

// from config/export.json
{
"TMS": {
"export": {
"apiURL": https://the.barnes.emuseum.api,
"outputDirectory": "src/dashboard/public/output",
...

To read from the API, we create an instance of TMSURLReader (which you can find in src/tms_csv/src/script/tmsURLReader.js) using the API URL. That object has a function called _urlForObjectWithId that we use to get the URL corresponding to a particular object in the collection.

To export from TMS to a CSV file, we simply call TMSURLExporter.next until the API stops returning new objects. To write to a CSV file, we use a Node package called fast-csv, which makes everything ridiculously easy.

If you want to try running the TMS to CSV export script yourself, against your own TMS API, just modify config/export.json and run npm run tms-export from the repository root.

The Value of Comma Separated Values

Rather than export data directly from TMS into Elasticsearch, we decided to export a CSV (comma separated value) as an intermediate step. Why? First and foremost, a CSV file is readable. It’s about as readable as a non-prose file can be. It’s hard to overstate how nice it is to be able to look at a file with your own, human eyes and say, “Indeed, this is my data.”

Very reassuring

Rather than having our data floating around in TMS, with a CSV file we have a database-agnostic data blob that we can pass around conveniently. Later we may build an Elasticsearch index with caches and views and nested data types and other such trickery for speeding up queries or performing advanced searches, but the CSV is just Big Dumb Data. And sometimes that’s nice.

An Elasticsearch Appetizer

Elasticsearch is a flexible, modern search and analytics engine that can also be used as a datastore. With our data in Elasticsearch, the world is our oyster. Remember before when I wanted to know about west-facing French oil paintings? With Elasticsearch getting the answer couldn’t be simpler (you just have to remember that west-facing works are on the east wall).

GET collection/object/_search
{
"_source": "title",
"query": {
"bool": {
"must": [
{ "match": {
"locations": {
"query": "East Wall",
"type": "phrase"
}
} },
{ "match": { "culture": "French" } },
{ "match": { "classification": "paintings" } },
{ "match": { "medium": "oil" } }
]
}
}
}

Using this query, I can tell you that there are in fact two west-facing French oil paintings in the Barnes Collection:

Child Holding Fruit. c. 1840. Oil on canvas. 21 3/4 x 18 1/4 in. (55.2 x 46.4 cm) BF554
Bishop Saint, Saint Roch, and Saint Sebastian. c. 1460–1480. Oil on panel. 30 1/2 x 28 7/8 in. (77.5 x 73.3 cm) BF418

More to come

We’ve only just scratched the surface of some of the cool possibilities that Elasticsearch opens up for visualizing and playing with your data. In our next few posts, we’ll be diving deeper into our Elasticsearch configuration, exploring monitoring and discovery with Kibana, and taking a look at tying our backend together with Seneca.js. Until then!


The Barnes Foundation collection online project is funded by the Knight Foundation and our code is open source. Follow the Barnes Foundation on Medium.