New MAAS Online Collection: The Techie Stuff

Published in

MAAS Labs

5 min readMay 4, 2017

The new MAAS online collection is now public! You can check it out here: https://collection.maas.museum.

The new online collection, as modelled by this lovely Surface. Photo: Felix Warmuth

Although we are only posting about it now, we did a sneaky soft launch in March. Redirects were set up and the old collection website (known as the Powerhouse Museum OPAC) was decommissioned after 11 years of service (thanks to Seb, Luke, Giv and others on their work).

Other than UX consultation, the new website was designed and developed entirely in-house at MAAS by the Digital Studio team.

Before starting this project, we had several high level goals in mind:

Completely revamp the tech stack, as this was a green fields project.
Improve the speed of updates from our collection object management system (EMu). Previously, data from EMu would be ‘harvested’ once a month into the old collection website.
Rebuild the central API and use it to run the collection website, as well as other museum applications.
Move the stack from on-premise to the cloud. Not only does the Museum have a mandate from the NSW Government to do so, it protects the application from scheduled and unpredictable power outages.
Enable recommendations, linking similar object records and other content.
Release a public API, opening up our data to the world.

We’d like to share this technical overview with the wider museum community. We’ll then follow up with more in-depth articles in the coming months.

Microservice Architecture

The new collection website follows a microservice architecture pattern. This allowed us to break the application down into smaller independent parts. Although it complicated our stack, each service can be shared across other applications and be upgraded or replaced independently.

Infrastructure

While the old online collection was hosted on-premise at the Powerhouse Museum, we decided on hosting the new site in the cloud on AWS. This made it easy to break our stack into microservices.

These are the actual services that make up the whole application:

Harvester
Database
API
Image Service
Search Service
Collection Website

Harvester

The harvester is responsible for getting object related data out of EMu (our collection management system). This PHP Symfony application transfers information to the database, multimedia files to S3 and also indexes our Elasticsearch service.

The harvester can be run via the command line to pull data out. More importantly, the harvester now works in near realtime, pushing to the database whenever a curator or registrar updates an object record. This has turned EMu into a defacto CMS for online content.

Database

Our development team had plenty of experience in SQL databases like MySQL. However we succumbed to the hype and decided to try out ‘schema-less’ NoSQL type databases.

We ended up with MongoDB, a NoSQL database that stores data natively as JSON.

MongoDB also plays nice with our API server-side tech: NodeJS.

API

This NodeJS app runs on the popular Express package. We tinkered with a REST API but eventually settled on a relatively new standard called GraphQL by Facebook (this process was documented in a previous article).

GraphQL’s query language is a thin layer that can sit on top of multiple data sources. Queries only fetch the declared fields and can also fetch across different database sources — all in one go.

We found that this not only improved performance, but also our productivity when building the client application.

The API is now public, with documentation available at https://api.maas.museum. We’ll be continually adding to the docs in the coming months once our internal APIs are solidified.

Image Service

The old object record harvester created pre-defined image sizes for use on the website. We have modernised this approach by using the Thumbor image service. This enables images sizes to be built on the fly, based on a simple url scheme.

Thumbor was invaluable during the design phase, as dynamic image sizes enabled different page template designs to be prototyped instantly.

We also experimented with Thumbor’s smart crop feature. The results were mixed, so we built a tool to manually set the focal point for each image. This ensures the focal point is in the middle of the image and does not get cropped.

The snazzy custom focal point tool. By default, the focal point is dead centre, but we can change it manually here.

All Thumbor generated image are stored in Amazon S3 and uses Cloudfront as the CDN.

Elasticsearch Service

Rather than building our own search system, we leveraged Elasticsearch, an open source search and analytics engine. As previously mentioned, the harvester indexes Elasticsearch in near real-time when an object record is updated.

Object search and recommendations were the main focus for the collection website. However, we also indexed a large volume of content from our Wordpress site — https://maas.museum. We were then able to create recommendations between collection objects, museum exhibitions, events, store products and blog posts.

Collection Website

Similar to the API, the collection website also runs on NodeJS. After some experimentation, we chose Facebook’s React library for our frontend framework. Other options were considered, but React had one feature that stood out — server-side rendering.

Because of this, we were able to write React code that runs on the server and the browser. This enables extensive code sharing, blurring the line between ‘frontend’ and ‘backend’ code. This type of application can only be written in Javascript and is known as a Universal or Isomorphic Application.

We realised that Javascript frameworks come and go very quickly (Javascript Churn). However we knew that Facebook used React in production (dogfooding), so it seemed like a fairly safe bet.

What also sealed the deal was the discovery of Searchkit. This React library easily integrated with Elasticsearch and gave us a comprehensive search interface with minimal work (check it out our search page here).

Here’s how all the parts fit together:

The Future

Despite so many moving parts, the collection website has been running smoothly since the soft launch (touch wood!). We’ll continue to add more features, tidy up our deployment processes and complete the public API documentation.

This year, we’ll hopefully integrate the new stack with our main Wordpress site. One idea is to use our Universal React stack to render content using GraphQL via the Wordpress API. Integrating Elasticsearch into Wordpress is also on the cards, improving our content recommendations and enabling us to build a ‘collection style’ search interface for exhibitions and events.

Credits

Arul Baskaran: Strategy and Project Lead.
Rowan Stenhouse: Infrastructure, Harvester, Database and Image Service Lead.
Kaho Cheung: API, Collection Website and Design Lead.
Lachlan Gordon: Elasticsearch Service Lead.
Melanie Charters: Content and Project Management.
Meena Tharmarajah: UX Consultant.
Dan Collins: Sponsor, Head of Digital and Technology.