Each small line represents a product on the site and its colour. And this is what Mr Porter looks like.
I didn’t set out to produce these visualisations. They are the result of experiments and mistakes made attempting to answer a simple question. Which colours are similar?
I’m a developer in the Merchandising Recommendations and Search team at YNAP. We support the visual merchandisers, automating their repetitive tasks, freeing them to spend their creativity and fashion knowledge on curating products across Net-A-Porter and Mr Porter.
Product pages have “You May Also Like” sections, listing items that the merchandisers believe compliment, or are an alternative to the main product. Even as a computer scientist, I love that these recommendations are hand curated and not collaborative filtered by an algorithm.
A “You May Also Like” section has at least four recommendation slots, but that doesn’t mean the merchandiser needs to make just four recommendations. Each slot has a queue of alternatives, meaning that a recommendation is always shown, even if the primary item is sold out or unavailable.
Creating these backup queues is one repetitive task we’ve automated. A merchandiser has already made the creative decision in picking a complimentary item. The queue just needs to contain products similar to the hand-picked one. We judge the similarity of two products by weighting and combining a set of individual distance measures: price, category, designer, image similarity and colour.
Colour is perhaps the most important factor to the merchandisers in judging what products are similar. A product in the wrong colour stands out as being a bad recommendation, regardless of how close it is by the other distance measurements. Make a good colour choice and you can get away with a lot.
All products across our websites have a label colour. This is the colour name that makes most sense to our customers, who tend not to shop by hex codes. The label colour is visible in the user interface, you’ll see a list of them if you filter by colour.
In the past we used label colour for calculating how similar two products were. This worked reasonably well and was easy to understand. It did have some limitations, not all label colours are created equal.
Nobody would argue against blue being a colour. But metallic? Metallic is more of property of another colour. Is neutral a colour? Or a collection of colours? Tan or cream would be considered “neutral” but what about light pink? What about products that are made up of several colours? Or are transparent? I’ve not mentioned that we have an “animal print” colour. If the computer scientists were in charge, we would have likely added an animal sub-type. We would still be arguing the case for tortoiseshell being a colour, pattern or texture.
Rainbows have seven colours because Isaac Newton liked the symmetry with musical scales. The names we give to colours are a human concept, tied to a fuzzy range of wavelengths that culture has settled on.
A colleague, fluent in Russian (hello Alisa!) taught us that in Russia dark-blue and light-blue aren’t two variations of the same colour, but two distinct colours entirely. Goluboy and siniy are as different as yellow and red are in English. Words don’t matter? Russian speakers are better at distinguishing between light and dark blue because they have different words for them.
Using labelled colours did allow us to measure distance between colours, using lists of similar colours, but this was subjective. Colour perception itself is not consistent between individuals (remember the dress), never mind similarity. Label colour had an additional problem for recommendations. It didn’t account for variation within a colour, “blue” covers everything from indigo to sky blue.
Relying on label colours would always limit the recommendations we made, but we couldn’t expect merchandisers to add precise hex colour codes to product descriptions. So we began to extract the colour palette from product images using Color-Thief. We refer to these as sampled colours. Color-Thief proved to be more accurate than we anticipated. There’s lots of internet projects doing things with colour that I suspect use it. A personal favourite is the Beyoncé Palettes Tumblr.
The Distance Between Two Colours
Color-Thief provides RGB (Red Green Blue) values for each product which we can use to judge colour similarity. This is where things get complicated. RGB values are just three dimensional points in space, so the obvious thing to do is calculate the Euclidean distance between the two points. This does not work well. The mathematical distance between points in RGB space does not correspond to how humans judge the similarity of the colours at those points.
Colour distance is easier to understand when represented in HSL (Hue Saturation Luminance). In this form, hue represents what we typically think of as colour. Saturation is the purity of that colour, it would take you from a hot pink at high saturation, to a pastel pink at low saturation. Luminance is how bright the colour is. Saturation and luminance exist on a line from a minimum to maximum value, but hue exists on a circle (think of a colour wheel), the largest hue value being next to the smallest one. Colour similarity models are typically based on HSL rather than RGB.
The Euclidean distance between RGB or HSL points doesn’t correspond with human perception of colour similarity because people are more sensitive to the brightness of some colour components than others. There are colour space models that try to take account of this. These models have had many iterations and there is still debate about them. We implemented one model (CIEDE2000) and this did produce an improvement over our naive Euclidean distances, but still didn’t give the results that our merchandisers needed. At a certain point, colours are just different. It doesn’t matter if one colour is less different than another, past that point they both become bad alternatives.
Disney x Mathematica
Theodore Gray wrote about how he decided where those points should lie while extracting colour palettes from Disney films. What Gray did was to cluster the colour values. Once the colours had been put in clusters, Gray could sort by the brightness of the mean value of the clusters and then by the brightness of the colours within each cluster.
Gray’s problem was close to what we were struggling with. We too could use clusters to judge similar colours. Using DL4J it was easy to build k-means clusters of product colours, but it was difficult to get a sense of how well it was working. I still had the Beyoncé Palette Tumblr in mind so I created a colour palette that would show the entirety of Net-A-Porter, with products arranged into their colour clusters. A few iterations later and I had the visualisation that opened this article.
I shared an early version with a colleague (hello Kristian!) who pointed out that it looked like the gel electrophoresis profiles used in the analysis of DNA
I was focused on colour, but what Kristian saw was something else, a single page representation of the whole Net-A-Porter site. A lot of information compressed into a single screen. It prompts more questions than it answers. Does this colour map change (mutate) over time? Does it vary by region? Are there core parts of the map that remain consistent? Do specific designers occupy a region of the map? What does Nike’s colour scheme look like? Or Gucci’s?
We still need to solve the original colour similarity problem for the visual merchandisers, I think the clustering will do that.
What I’m excited about now are the questions that the visualisation prompted. I don’t know what value exists in answering them. There might be none, but I expect a whole new set of thoughts and ideas to spring from attempting to.