Building an algorithm that sees colour as we do.
What would you say the dominant colour of this striking cat picture is?
To me, it’s blue — azure, to be specific. Those eyes are piercing and it’s the only thing I notice in this photo.
To the computer, using all the algorithms I’ve tried, the dominant colour is beige or grey. In fact, azure doesn’t even crack the top 10 most of the time.
I discovered this problem while building ColourMatch, a search-by-colour tool that lets web designers find beautiful stock photography that matches their site’s palette.
Searching photos by colour is not a unique idea, but all the implementations I’ve seen go with this “overwhelming majority” strategy. Want blue images? Prepare yourself for images of empty blue skies and underwater bubbles. Wouldn’t it be better to get images where the blue is the colour that stands out of a normal-coloured image? Like those piercing cat eyes?
This has been a beautiful and challenging problem that took up the lion’s share of ColourMatch’s 2 week development period. There are no ruby gems or plugins to solve this problem for you; I had to roll up my sleeves and dive deep into colour theory and statistical analysis. It’s been an unbelievable learning experience.
A few people have asked me how I did it, so I thought I’d do a quick walkthrough of the methodology.
The first step of any photo-matching tool is to reduce the number of colours into something more manageable. A JPEG has up to 16.7 million colours, and that is far too many.
Quantization has been around since the 70s, and has been instrumental in making file sizes more manageable. There are very smart low-level algorithms that kick in whenever you compress an image, reduce its width/height, or convert a JPEG into a GIF.
I opted to use ImageMagick, a Linux command-line utility, to handle all my quantization and histogram-generation. I didn’t know how much of a colour nerd I was until I spent an hour with their documentation.
How much should I reduce it by? Well, let’s see. Here’s what cat photo looks like at 256 colours:
Surprisingly good, right? How about at 32 colours?
Egad! We still have a blue, technically, but it’s far less saturated than in the original photo.
After a lot of experimentation with a lot of photos, I landed on 48 as the optimal number of colours.
RGB and HSB and LAB, oh my.
After plucking out the top 48 colours, I run them through a service which converts a hex colour string like “#123456” into a hash with the following colour spaces:
RGB: The most common colour space, the one your monitor is probably using right now to render this page. Consists of a Red value, a Green value, and a Blue value, all between 0 and 255. Piercingly bright red is 255–0–0. 0–0–0 is black, 255–255–255 is white.
HSB: This colour space is the one most colourpickers use, like the photoshop one pictured. It’s intuitive for us humans. It stands for Hue/Saturation/Brightness. Hue is the tone of the colour, Saturation is how close to grey it is, and Brightness is how close to black it is.
LAB: A colour space I didn’t know existed, LAB colour space more closely approximates human vision. L is a ‘lightness’ value from 0 to 100, A is a range between red and green, B is a range between yellow and blue. It’s also made to be device-independent.
What’s the point of LAB? It’s made to be “perceptually uniform”, which means that you can change any value by 10% and it’ll appear to be 10% different. The way RGB is designed, some values can change by 10% and be barely noticeable, while others become whole different colours when shifted by 10%.
This becomes important when trying to assess how similar two colours are, when doing matching. But I’m getting ahead of myself.
Ok, so now what?
So I had 48 colours, but I only cared about a handful of them. For some images, there were half a dozen noteworthy colours, and for others, just one or two. How best to figure out which to use?
I tried a lot of different solutions. One of the more interesting ones was a
k-means implementation. None of them were able to produce ideal results.
Colour is so intuitive to me, as a human, but I started trying to quantify it. Why do certain colours stand out, anyway? Generally, it’s contrast; the colour stands out because it’s so radically different from its surroundings.
It turns out, the HSB colour space is perfect at putting those differences into numbers.
Let’s take another look at Daniel Paulsson’s cat photo.
Hue: The eyes are blue, but so is that blurry pillow on the far left, and my eyes don’t care about the pillow. I don’t think hue is why it’s noticeable.
Brightness: Nah, the eyes are bright, but so is almost everything else in the photo.
Saturation: Now we’re talking. Everything in the photo is pretty desaturated: lots of greys and beiges. Those eyes, though. Those eyes are saturated.
In statistical terms, I’m looking for outliers: colours that are significantly far away from the mean. First, I get the standard deviations for hue, saturation and brightness. Then I look for colours that have a z-score (number of standard deviations away from the mean) of at least 2.5.
By the numbers.
This cat photo has a mean saturation of 9.1%, and a deviation of 9.5%. That means most colours are between 0 and 20% saturation. The piercing azure is 49% saturated, making it 4.2 standard deviations away from the mean. That’s a lot.
Caveats. There’s a bit more math to get optimal results. A big part of the problem is avoiding filling a palette with very similar colours when there are more varied options; better to get a diverse range. This is worked into the calculations, so colours that are dissimilar from those already in the palette are weighted more heavily.
I also don’t care about colours that take up less than 0.1% of the canvas. There were times when the histogram would pull out a really vibrant colour that I couldn’t see in the photo, because it only occupied a pixel or two in the corner.
Finally, there’s the hue problem. If a photo is primarily blues and reds, a strong purple might stand out, but because on the Hue wheel purple is the midpoint of blue and red, it isn’t an outlier. The clustering k-means method I mentioned earlier did a better job in this edge case, but a worse job in every other case.
Double the Histogram, Double the Fun
When analyzing a photo, I actually perform two histogram generations. The first quantizes to 48 colours, the second to just 4.
There are times where an interesting colour isn’t actually an outlier. For a photo of the pyramids, I actually do care about the vibrant blue sky and the dusty beige-yellow sand. Quantizing to just 4 colours helps me find the average colour for these regions, based on occurrence.
Initially I did this math myself with the 48 colours, but it turns out ImageMagick’s quantization is way smart; it doesn’t just look at overall occurrence, but it considers layout and positioning.
Building the Perfect Palette
I decided that 6 colours was a reasonable maximum for most photos. While there are some rainbow-coloured images, those images aren’t really the point of this tool. I really only care about images that have a handful of noticeable colours.
I experimented with a lot of different ratios and mixes for filling up this 6-colour palette, and in the end my strategy is pretty straight-forward.
I start by taking the best outlier from each HSB bin, for a max of 3 outliers. I combine them with the 4 common colours from the other histogram.
In the rare case that there are 3 suitable outliers and 4 common colours, I ditch the common colour that takes up the least amount of real estate. This is indeed a rare case, since I do a lot of filtering and combining of both histograms before this merge. Many photos have less than 6 colours as a result.
Example time: the cat photo has that amazing saturation outlier with a z-score of 4.2. It also has a reasonable brightness outlier at -2.7 (the black kitten). No hue outliers met the 2.5Z threshold, so we combine those two outliers with the 4 common colours.
Here’s what the palette looks like for our cat photo:
And here are some of the results that this photo found:
Now, that first result couldn’t be more perfect, but the others aren’t as ideal. When you consider the sample size I’m working with, though, I think it’s doing a pretty bang-on job.
One thing I haven’t covered yet is where these results are coming from. I’ve opted to use 500px because they have a great API and beautiful photos, but I need to fetch and analyze photos one at a time; I can’t do real-time searches using their whole library.
I have a cron job set up to automatically fetch and analyze photos on a daily basis, but to avoid pissing off the 500px team, I’m limiting it to about a thousand new photos a day. At the time of writing, the database was up to about 13,000 photos. Not a very large sample size, considering the 2 million+ photos other services use.
The cron job runs a rake task that performs the above mentioned calculations, and then stores the colour data and the photo meta-data in my own database. I‘m not storing the photos themselves, just my analysis of the photos, along with 500px data like photo name, photo link, photographer user info, and whether the photo is for sale or not.
Aside from the 6-colour palette, I store some generalized statistical information in a Stat object, like the mean and deviation for each of the HSB channels. This comes in handy later on.
So at this point, my palette generation is functional enough to move on. Time to play with matching!
It’s much simpler to handle colour searches (where the user picks a single colour) than photo searches (where the user uploads a photo). Let’s start with the simpler option.
After the user enters a hex colour code, I do the colourspace conversion. I use the LAB colourspace and a slightly modified euclidean distance formula to find the closest colour in the palette to the search colour.
Euclidean distance is basically the Pythagorean theorem. If you have color A with points 1, 2 and 3, and colour B with points 1, 2, and 3, it’s this:
squareroot( (A1 - B1)^2 + (A2 - B2)^2 + (A3 — B3)^2 )
Instead of 1–2–3, I’m using the colours’ L-A-B values. And I’ve modified it by multiplying the L value by 0.7, so that Lightness isn’t weighted as heavily as AB. Generally, within reason, a darker version of a colour still feels like the same colour, whereas a colour with the same lightness but a different hue feels different.
As an example, if the user selects a red, I start checking that colour against photos in the database. For each photo, I run through its 6-colour palette, looking for the colour that is closest to the red.
It may not be intuitive to think about “distance” when it comes to colour, but colour is really just a 3-dimensional data set.
In this LAB illustration, if you pick any two points inside the sphere, you can trace a line between them to calculate distance.
The distance returns a number between 0 (the same colour) and ~250 (polar opposites). I invert it and normalize it so that the same colour is 100 and polar opposites are 0, and then I just return the distance score for the closest match. Because every colour in the palette is important, this seems to be a pretty good strategy!
Right now, the threshold for a good match is 94%. Any photos that have at least 1 colour over 94% match gets returned.
This strategy works great at finding outliers, since it only needs to match 1 colour.
Photos are more complicated. Having a single one-to-one match isn’t good enough, and I honestly don’t have the database size to be able to insist on perfect matches for every colour.
I tried a lot of different strategies, and it turns out different strategies work better for different kinds of images.
Let’s look at some examples.
By Hue Deviation
Remember earlier, when I said that I stored HSB mean/deviation data? This is the kind of circumstance where it comes in handy.
The palette for this photo is, naturally, a lot of purples. The saturation and brightness vary quite a bit, but the hue does not. Hues are calculated, by degrees, on a 360° wheel. This photo’s mean hue is 239°, but its deviation is a lowly 2.07.
That means most colours are between 237° and 241° on the hue wheel. Not a very big range.
Instead of going through all the hassle of finding other photos with similar purples, I can just find photos that have a very similar hue mean and deviation. We don’t even have to fuss with the palette.
Here’s what it finds:
The problem with this strategy is it fails horribly on photos with a high deviation. Since a purple photo with high deviation could have a lot of reds, or a lot of blues, or a whole kaleidoscope of colours.
By Threshold Matches
Another approach is just to set a minimum match score, like 80%, and insist that half of the input photo colours meet that threshold.
For each of the photos in the search photo, find the closest match in the target photo. If a photo has 6 colours, at least 3 of them need to find a match >=80%.
This strategy was horrible at first, until I realized that low-saturation colours were skewing it. Most photos have 1–2 shade colours like white, black or grey. So first I removed any photo with saturation below 18%, and made sure at least half of the remaining colours had a good match.
This strategy is OK, and I’m still experimenting with the circumstances in which it works best.
By Common & Saturated
A surprisingly good approach, and the one that I’m using most often, is to ignore all but 2 colours: the most common one, and the most saturated one.
This is the strategy used in our cat photo earlier, to get these matches:
The only colours it’s using is the piercing azure and a medium beige-grey. I find the highest match in the target photo for each of the two, and return the average of their scores.
A slight tweak I made was to consider saturation and brightness, but to weigh saturation more heavily. A colour with 60/60 saturation/brightness is probably better than 80/10. So the formula I’m using now is saturation + brightness*0.75.
The edge case where this solution fails the hardest is when there’s a strong hue outlier in one of the two photos, since it totally ignores a very noticeable colour.
The other worrisome factor with this process is performance. Because I have such a small photo library, searches ought to complete instantly, right?
Not so. A photo search can take a few seconds to complete, and that time would get worse as the database grows. Still, I’ve done a few things that have dramatically improved performance:
Proper SQL queries
Two SQL tweaks improved speed by over 90%: Adding indexes to foreign keys, and eager loading.
Indexes. As a Rails developer, I’ve been spoiled by ActiveRecord. I’ve been learning more about SQL databases and efficient queries, and adding indexes has made a big difference.
Of course, it’s a double-edged sword. Adding indexes improves the SELECT time, but worsens the INSERT and UPDATE times. This means analyzing new photos gets slowed down. Still, given the dramatic improvement to SELECT queries, it’s a fair trade.
Eager loading has also made a massive difference. My models are set up so that Photo has_many PhotoColours. Initially, I was having the n+1 query problem, since I had to perform a separate query during every iteration to fetch the photo’s colours.
By calling Photo.includes(:photo_colours), I’m able to fetch all of those associations at once, at the start of the search.
Improving Perception of Time
The other thing I realized is that it’s more important that the search feels fast. Two tricks have made searches feel very quick indeed.
EventSource, also known as Server-Sent Events, are an alternative technology to WebSockets that allows server-to-client interaction.
Normally, the client would send a request. “Hey server! I have this colour. Want to find all the matching photos for me?”. The server would then query its database, analyze every photo returned from the query, and send a massive batch of photos to the client.
The user sitting at the computer would see nothing at all for several seconds, and then an explosion of matches would fill her screen.
EventSource is a way of having the server send data on its own terms to the client. The client sends an initial HTTP request and keeps the line open, allowing the server to send data at its discretion.
When the client makes that initial request, the server begins its search, but instead of waiting for the entire operation to complete, it streams results one at a time as it discovers them. This allows for photos to show up on the user’s screen almost right away.
There was still the matter of the initial SQL request, to fetch all Photos in the database. With the eager loading from includes, this was adding a significant delay to the start of result streaming.
Find_in_batches allows Rails to lazily query in blocks of 1000. So I load the first 1000 photos and eager-load their colours, perform all the analyzing, return the results, and then start on the next batch of 1000. The initial SQL query is quick, and results stream lightning fast.
To truly perfect this tool, it would take months of subtle tweaking and experimentation. I built ColourMatch as a proof-of-concept, and to challenge myself to do something I didn’t know how to do.
Rails is a great web development language, but it’s not a good language to be doing serious computation with. I think it ought to be rewritten in a more efficient language like Go.
One of the major flaws with my photo analyzing is it doesn’t take position into account. A colour in the centre of the photo is probably more important than a blurry one in the corner. Everything comes at a cost, though; it already takes a couple of seconds to analyze every photo.
If you have any questions or comments about this project, talk to me about it!