Exploring the M+ Collections through Colour

Rev Dan Catt
M+ Labs
Published in
8 min readSep 25, 2019
Screenshot from the colour picker on the M+ Collections beta website.

Dan Catt is a lead engineer at Micah Walter Studio who is working closely with M+ to design and build the M+ API (api.mplus.org.hk), a key part of M+’s open access approach. The M+ API powers the M+ Collections Beta website, enabling M+ to dynamically display its collections’ content online. A version of the M+ API containing our open data set has been made available for public use. To access and explore the colour metadata that Dan delves into below, visit api.mplus.org.hk or github.com/mplusmuseum/collections-data.

Earlier this year, M+ held its second weekend-long hackathon, a chance for M+ to publish the museum’s updated open data and a chance for people to explore that data.

Data is a funny thing. It comes in all shapes and sizes, and almost invariably not in the format you want it. It can come from several sources; some inherited with their own particular styles, others with hardly anything at all. As M+ gathers and sorts through the objects that are already in its collections, as well as those getting added to them, the data goes into a big data melting pot, in which many people work hard to wrangle it to make some logical sense.

The results of that wrangling were published as the M+ open data set alongside the first M+ hackathon last year. It consisted of a collection of common core elements that were mostly already there: things like titles, categories, mediums, how old an object was, and who created it — just the tip of the iceberg.

By the second M+ hackathon, there was more data ready to be shown to the public: item dimensions, exhibitions, which items were in which archives. Some data still can’t be included in the open data set for various reasons: data from different sources that still needs to be standardised, data that needs to be translated, data that needs more research, often related to artists and makers, dates, and locations. All of these things take time and resources, and these records need to be completed before any of that data can be released.

One set of data that is particularly hard to pin down is — and this will probably come as no surprise — copyright around objects and images of objects. Sure, a museum can hang a painting on its wall and even display it online, but when including an image of that painting in an open data set or API, through which other people can use and display it, then it starts to get a bit more complicated — especially for a museum of contemporary visual culture like M+.

Colour!

Even though images can’t be included in the API, there is some data that can be included; in this case, the colour information that gets extracted from the images. While the research into the copyright of the images is going on, this colour information data can still be placed into the API. One of the exciting parts of releasing an API for public use is not knowing what people will do and build with the data. I can think of some ways to use the colour information data — for example, finding similarly coloured images — but others may think of new, interesting uses. This is why putting as much information into the API as possible, even when some things can’t be included, is important. It creates more opportunities to discover things in the data.

Our first step when getting this data into the system was to grab the images for each object and use image processing to calculate the distribution of colours. The images were analysed in a number of ‘passes’. Each pass counted the pixels of each colour, combining ones that are close in value to the others and reducing the image from thousands of colours down to hundreds. The next pass again grouped similar colours, reducing the whole palette further. This was repeated until we ended up with a collection of distinct enough shades.

In this process, some images can be reduced to just a small number of distinct shades, while others can be reduced to a couple of dozen colours. Whatever the final number of colours an image is reduced to, it can tell us something about the complexity of the original image. Below is an example of what the colour palettes look like compared to the original image.

Left: Tadanori Yokoo, Diary of a Shinjuku Burglar, made 1965, reproduced 2006, silkscreen. M+, Hong Kong. © Courtesy of Tadanori Yokoo. Right: Colour palette interpretations of the work.

The first image from the left is the work itself (obviously), and the second image is a breakdown of the actual colours in the work. The third image is a more simplified breakdown of the colours, which helps the computer quickly search and match images. The colours in this breakdown are called the ‘search colours’ in the API.

Colour searching

One of the problems with colours is that there’s an awful lot of them! The image below, for example, has been divided into two halves, each with a slightly different shade of red. The one on the left is (in computer terms) represented by the value ‘#FF3333’ and the one on the right is ‘#FF3330’.

Shades of red

This is troublesome when we ask the collections database to find similar images. If we ask for more images that are ‘#FF3333’, then we won’t get the very similar ‘#FF3330’. Even though we can see that they are pretty much the same, to the database they are two completely different things.

This is where the ‘search colours’ come in. The system breaks down an image into a small range of base colours; red, orange, green, and so on. This small range of colours is based on light and dark versions of the primary and secondary colours, as well as brown and three shades; black, grey and white. These are typically the predominant colours in a lot of images, and allow images to be quickly grouped into different parts of the colour spectrum.

Instead of reducing an image to a unique colour palette, the image is reduced by comparing each pixel to the ‘search colours’ and mapping it to that colour. Then the amount of each colour is recorded. For example, an image could consist of a lot of red, a medium amount of orange, a small amount of yellow, a tiny amount of blue, and no other colours.

Then, instead of asking the database for ‘#FF3333’, we can just say ‘Give me all the images that are red/orange/green enough’, and we get a set of results back (hopefully).

This is excellent because databases love binary YES/NO choice, and this is a binary choice that isn’t too restrictive — we’re not asking for the specific #ff3333 but rather ‘does this image have red in it’ or ‘does this image have more than 15% yellow’, or even, ‘find me all the images without any blue’.

However, searching for distinct colours rather than the predefined ‘search colours’ would still be useful. ‘FF3333’ is a very ‘computer’ way of looking at colours. Instead, we could break it down into something more readable, like the RGB (red, green, blue) colour system, which uses actual numbers. The numbers can be anything between 0 (no colour) and 255 (the most colour). In this system, ‘FF3333’ ends up as…

Red: 255
Green: 32
Blue: 32

This is like a recipe, in which lots of red and a dash of green and blue give the final shade of red. If more blue is added, then it moves towards a more pink/purple colour, and if everything is set to 0 you end up with black.

Now, searching for similar images looks a little like this:

Red value #ff2020 as defined by its RGB values

This isn’t the most effective way to search the database, and won’t return the most relevant results.

However, turning to the good old colour wheel provides a better solution. This makes it possible to pick a point on the wheel and then ask for all of the images that fall into the range on each side of that point.

A colour range from a point within the colour wheel.

This is why the database and the API have converted the most prominent colours to hue values. Instead of asking for something with certain amounts of red, green, and blue, the API can instead be asked to find all images that are within a certain distance of a point on the colour wheel. For example, find everything within thirty points of this exact shade of red.

So, even though the images of artworks and objects can’t be included, people can still start to understand the collections through colours and shades of black and white, discovering similar objects in ways that wouldn’t usually be possible.

Playing with colours

Even without images, a few colourful things came out of the hackathon. The first was a look at how colours had or hadn’t changed over the years by Team Swire Hall, who created some lovely posters and data visualisations.

Posters and visualisation from M+ Hackathon participants Team Swire Hall.

Another was the ‘woodchipper’, which used the breakdown of colours to create a scrambled version of the original image. If the original image had many reds in, the scrambled result would have a lot of red ‘chips’ in. If there were small hints of orange, there would be a few specks of orange in the end result. The original image wasn’t shown, but you could still get a feel for it.

Left: Tadanori Yokoo, Diary of a Shinjuku Burglar, made 1965, reproduced 2006, silkscreen. M+, Hong Kong. © Courtesy of Tadanori Yokoo. Right: the woodchipped version of the work.

Finally, taking inspiration from the tower blocks in Hong Kong, I randomly generated cityscapes based on the predominant colours. The higher the tower block, the more of that colour in the original image. Each image in the M+ Collections has its own tiny downtown.

Building block data visualisation based on predominant colour

That is a quick overview of colour in the M+ open data set and API. You can find out more about the open data set over here, and explore objects by colour over there.

--

--

Rev Dan Catt
M+ Labs

Ex-Flickr, Ex-Guardian, now playing at the intersection between data, code, journalism and art.