Building a way to search photos by colour

At Unsplash, we’re always looking to use our data to create new features. Colour search isn’t an Unsplash feature (yet) but we’ve started thinking about it.

First of all, this article is only here to show you the results of our research and demonstrate the prototype we built. I’m definitely not saying that this is the best way to build a colour search for photos. Let me lead you through our thought process.

Are you lazy? Then just read through the example at the end of the article.

For the brave, let’s start at the beginning.

Colour in computers

Photo by Dhaval Parmar on Unsplash

RGB.

If you use a computer to code, design, draw or for other tasks, you may have heard this acronym before. If you haven’t, it stands for “Red Green Blue”. It’s a well known way of describing a colour as a mix of the 3 main components: red, green and blue.

The logic here is that you can represent a broad spectrum of colours by mixing a certain amount of red with a certain amount of blue and a certain amount of green. In computers (for bytes reasons), the amount of red, green and blue you’re mixing is limited to 256 values per colour channel (from 0 to 255, from dark to shiny).

For example, you can get black by mixing 0 of each colour (absence of colour is black). We’ll display black as (0, 0, 0) (for (R, G, B)). On the opposite side of the spectrum, white is (255, 255, 255).

You can define a ton of colours (~16.8M) using this representation model and the following equation:

// Add/mix two colors X and Y to get a new color
X(rgb) + Y(rgb) = Z(Xr + Yr, Xg + Yg, Xb + Yb)
Colour mixes with the RGB representation model.

Comparing RGB colours

To be able to search by colour, you need to be able to tell the difference between 2 colours. This is because when someone searches for a specific colour, there’s a chance that your photos won’t contain that exact tint but similar ones. If I’m searching for “dark blue” photos, I’m actually searching for a range of “dark blue” colours, not a specific tint. Although if some photos have that exact tint, they should match first.

The most obvious way to compare colours would be to directly compare their RGB values. By looking at the values for red, green and blue, we could tell if two colours are close or not at all. Right?

Well, nope.

According to RGB and the Euclidian distance, the green/brown tints on the left are closer to each other than the two red tints on the right … but are they ?

Why is this the case?
Simply because computers don’t define colours like the human eye does. Our eyes don’t interpret colours as mixes of red, green and blue. The colours we see are defined by other factors like their exposure to light.

So we need to find a way to represent colour that matches with the way our eyes interpret colours. Thankfully, there’s a colour space designed to approximate human vision. It’s called L*ab.

L stands for Lightness, a and b represent two colour components (a green-red component and a blue-yellow one).

From RGB to L*ab

We use Google Vision to extract colours from our photos. It gives us the RGB code, the coverage and the focus (background colour vs focused object’s colour) of the main colours in the photo. This allows us to determine the importance of each colour within the photo.

For us, converting from RGB to L*ab happens in two steps:

  • Conversion from RGB to XYZ
  • Conversion from XYZ to L*ab

XYZ is another colour space but I’ll skip the details because it’s irrelevant for our purpose.

I’ve been using easyrgb.com to get the conversion formulas:

  • RGB to XYZ
//sR, sG and sB (Standard RGB) input range = 0 ÷ 255
//X, Y and Z output refer to a D65/2° standard illuminant.
var_R = ( sR / 255 )
var_G = ( sG / 255 )
var_B = ( sB / 255 )
if ( var_R > 0.04045 ) var_R = ( ( var_R + 0.055 ) / 1.055 ) ^ 2.4
else var_R = var_R / 12.92
if ( var_G > 0.04045 ) var_G = ( ( var_G + 0.055 ) / 1.055 ) ^ 2.4
else var_G = var_G / 12.92
if ( var_B > 0.04045 ) var_B = ( ( var_B + 0.055 ) / 1.055 ) ^ 2.4
else var_B = var_B / 12.92
var_R = var_R * 100
var_G = var_G * 100
var_B = var_B * 100
X = var_R * 0.4124 + var_G * 0.3576 + var_B * 0.1805
Y = var_R * 0.2126 + var_G * 0.7152 + var_B * 0.0722
Z = var_R * 0.0193 + var_G * 0.1192 + var_B * 0.9505
  • XYZ to L*ab
//Reference-X, Y and Z refer to specific illuminants and observers.
//Common reference values are available below in this same page.
var_X = X / Reference-X
var_Y = Y / Reference-Y
var_Z = Z / Reference-Z
if ( var_X > 0.008856 ) var_X = var_X ^ ( 1/3 )
else var_X = ( 7.787 * var_X ) + ( 16 / 116 )
if ( var_Y > 0.008856 ) var_Y = var_Y ^ ( 1/3 )
else var_Y = ( 7.787 * var_Y ) + ( 16 / 116 )
if ( var_Z > 0.008856 ) var_Z = var_Z ^ ( 1/3 )
else var_Z = ( 7.787 * var_Z ) + ( 16 / 116 )
CIE-L* = ( 116 * var_Y ) - 16
CIE-a* = 500 * ( var_X - var_Y )
CIE-b* = 200 * ( var_Y - var_Z )

If you want to try this at home, here’s an example to check your results :

Colorizer interface to check your conversions between RGB, XYZ and L*ab

You want a sky blue, you pick it in a colour picker like Colorizer.

You get the RGB code:
RGB = (145, 231, 253)

You convert it to XYZ:
XYZ = (57.98, 70.26, 103.44)

Then from XYZ to L*ab:
L*ab = (87.13, -20.46, -18.80)

Let’s take a quick step back

Here’s a summary of what we discovered and what we have:

  • RGB colours don’t approximate human vision
  • L*ab is the colour space that approximates human vision
  • We can convert from RGB to L*ab with a couple of formulas
  • We know the importance of each colour in a photo thanks to Google Vision

So technically, we can store all that information in a database. Let’s imagine we have a photo_colours table. Each row would show:

photo_id | r | g | b | L | a | b | coverage | focus

A single photo would have multiple rows, each one describing each of its colours.

Comparing L*ab colours

Now that we have the L*ab representation of each colour in our photos, we’re able to compare the colours (a bit) like the human eye does.

To do so, we’ll use the Delta E (CIE 1994) formula. Here’s the algorithm:

CIE-L*1, CIE-a*1, CIE-b*1          //Color #1 CIE-L*ab values
CIE-L*2, CIE-a*2, CIE-b*2 //Color #2 CIE-L*ab values
WHT-L, WHT-C, WHT-H //Weighting factors
xC1 = sqrt( ( CIE-a*1 ^ 2 ) + ( CIE-b*1 ^ 2 ) )
xC2 = sqrt( ( CIE-a*2 ^ 2 ) + ( CIE-b*2 ^ 2 ) )
xDL = CIE-L*2 - CIE-L*1
xDC = xC2 - xC1
xDE = sqrt( ( ( CIE-L*1 - CIE-L*2 ) * ( CIE-L*1 - CIE-L*2 ) )
+ ( ( CIE-a*1 - CIE-a*2 ) * ( CIE-a*1 - CIE-a*2 ) )
+ ( ( CIE-b*1 - CIE-b*2 ) * ( CIE-b*1 - CIE-b*2 ) ) )
xDH = ( xDE * xDE ) - ( xDL * xDL ) - ( xDC * xDC )
if ( xDH > 0 )
{
xDH = sqrt( xDH )
}
else
{
xDH = 0
}
xSC = 1 + ( 0.045 * xC1 )
xSH = 1 + ( 0.015 * xC1 )
xDL /= WHT-L
xDC /= WHT-C * xSC
xDH /= WHT-H * xSH
Delta E94 = sqrt( xDL ^ 2 + xDC ^ 2 + xDH ^ 2 )

With this formula, we’re able to tell how close a colour in a photo is from the colour that was searched. Depending on the result, we can decide whether to show the photo in search results.

Handling multiple colours within each photo

Photo by Anastasia Yılmaz on Unsplash

Here’s another problem:

All photos have different colours in them — sometimes very different colours. 
If a photo contains tints of red, orange and green in different amounts and the user searches for a specific “green”, should we show this photo in search results?

The coverage and focus data that we’re pulling from Google Vision allows us to judge the importance of each colour in the photo. We’ll give more weight to a colour covers a lot of pixels and is the focal point of the photo and less weight to a colour that is less important.
This weight will add a factor to the distance (calculated thanks to Delta E) between the searched colour and the weighted colour. We’ll consider an important colour that is close to the searched colour as a very good result. But if an important colour is really far from the searched colour, then it’s a very bad result. A less important colour will have a lighter impact on the result.

By computing the distances between the searched colour and each colour (weighted by importance) in the photo, we can tell how well the photo would fit in the search results.

The issue here is that with hundreds of thousands of photos in a library, computing all the distances between all the photos for every search request is not scalable, as it takes way too long to perform the query.

Scaling the system

Photo by Denys Nevozhai on Unsplash

When a user searches for a colour, the heaviest task is to calculate the distance between the searched colour and each colour in the photo.

What if we could pre-compute all the distances between all the colours?

This sounds like a good idea … except that there are way too many colours to store the results efficiently. As we already calculated, RGB can represent a total of 256 x 256 x 256 ~ 16.8M colours.
Pre-computing all the differences between all the colours would lead us to store 16.8M x 16.8M ~ 282,000,000,000,000 distances. That’s way too many rows to go through when searching for the difference between two colours.

So how can we bring that row count down?
Well, we can lower the number of colours available in the RGB spectrum.

In its original form, each component of RGB goes from 0 to 255, with incremental steps of 1: 0, 1, 2, 3, 4, 5 ... 254, 255

Instead, we’ll only consider a subset of that spectrum by increasing the step. Instead of having a step of 1, we’ll use a step of 32.
So now, each component of RGB still goes from 0 to 255 but with an incremental step of 32: 0, 31, 63, 95, ... 192, 224

Instead of 256 values available for each component, we’ll only have 8.
This means that our new spectrum can represent 8 * 8 * 8 = 512 colours.
We could now compute and store all the differences between all the colours in 512 * 512 ~ 262k rows. This is a much more manageable number.

So yes, obviously we lost precision. Depending on the use case, there is a balance to find between precision, storage size and computing time. For our purposes, we haven’t finalized our exact settings yet.

Final setup

To recap, we can build a colours_distances table that gathers the distances between each of the 512 colours of our custom RGB spectrum. It would look like this:

aR | aG | aB | bR | bG | bB | distance

where:

  • (aR, aG, aB) is the RGB representation of colour A
  • (bR, bG, bB) is the RGB representation of colour B
  • distance is the distance between A and B (computed by converting to L*ab space and calculating Delta E (CIE 1994) on these L*ab representations)

We also have our table showing the colours within each photo:

photo_id | r | g | b | L | a | b | coverage | focus

When a colour search request comes in, the first thing we want to do is to figure out a range of tints that would satisfy the request. We don’t want to search for one specific colour in our photos, searching for a close range of colours is better. To do just that, we can look in our colours_distances table:

SELECT bR, bG, Bb, distance
FROM colours_distances
WHERE aR = searched_color_R
AND aG = searched_color_G
AND aB = searched_color_B
ORDER BY distance DESC
LIMIT 5

The result of the query will be a set of tints that are potential matches for the searched colour, with the distance between each tint and the searched colour.

From there, we pick all the photos containing one of these tints. To sort them by relevance, we consider distance (difference between a colour in a photo and the searched colour) as a penalty.

penalty = (distance + 0.01) * (100 — coverage) * (100 — focus)

High difference, low coverage and low focus will increase the penalty.
The best photos are going to be the ones having a colour with a low penalty… or colours with a low penalty average… or sum (your choice). In our case, we’ll use the the first scenario.

Note that you can add/remove weight to the coverage and focus factors if you’d rather have photos in which the searched colour covers a lot of pixels or photos where the colour stands out.

Example

SELECT c.photo_id,
MIN(distance * (100 — coverage) * (100 — score)) as penalty
FROM search_tints t
JOIN photo_colours c
ON c.red / 32 * 32 = t.searched_red
AND c.green / 32 * 32 = t.searched_green
AND c.blue / 32 * 32 = t.searched_blue
GROUP BY 1,2
ORDER BY penalty
LIMIT 30

Within each photo, we look at the colours matching the tints of the search and pick the colour with the minimum penalty. Photos with lower minimum penalties will rank higher.

Complete query

WITH search_tints AS (
SELECT
bR as target_red,
bG as target_green,
bB as target_blue,
difference
FROM colour_distances
WHERE aR = searched_color_R / 32 * 32
AND aG = searched_color_G / 32 * 32
AND aB = searched_color_B / 32 * 32
ORDER BY difference
LIMIT 5
)
SELECT
c.photo_id,
MIN(
(c.difference + 1) * (100 — c.coverage) * (100 — c.score)
) as penalty
FROM search_tints t
JOIN photo_colours c
ON c.red / 32 * 32 = t.target_red
AND c.green / 32 * 32 = t.target_green
AND c.blue / 32 * 32 = t.target_blue
GROUP BY 1,2
ORDER BY penalty
LIMIT 30

To summarize:

  • Get a set of tints close to the searched tint (search_tints) using the pre-computed distances between all the colours (colour_comparisons)
  • For all the photos containing one tint from this set, calculate the penalty of all its colours compared to the tints in the set and only keep the lowest one (or the average).
  • Sort the photos by minimum penalty of the colours in the photo (or average penalty, your choice).

Complete walkthrough example

For those of you who are still here, let’s go through a complete A to Z example.

Someone searches for the colour #FF9900 / RGB(255, 153, 0) which is a tint of orange. We project that RGB colour on our custom RGB spectrum (with steps of 32 for each colour instead of 1): RGB(224, 128, 0)

Illustration of the precision loss when projecting the searched colour on our custom RGB spectrum

You can see that our projection makes us lose precision. We’re also getting a darker tint. We could improve this by reducing the spectrum step value (currently sitting at 32). This would reduce the precision loss. We could also round the colour components to the closest step. We’re currently rounding down, that’s why it’s a bit darker. These suggestions have drawbacks as well but there’s definitely a sweet spot to be found with these settings.

We don’t know if we have a lot of photos containing that exact tint so we’re grabbing the closest colours around that searched orange from our table with all the colour distances (colour_distances).

With the original search on the left, here’s the set of tints we’ll be searching for in our photos

Given the large library size of Unsplash, surely we have a lot of photos containing at least one of these tints. Note that the colours in our photos also need to be projected on our custom RGB spectrum to be comparable with the colours from the set.

Random set of photos containing one of the tints from our searching set (Photos by Alessio Soggetti, Viktor Forgacs, Ricardo Rocha, Lesly Juarez and Ahmed Hasan)

You can see from the photos above that simply getting all the photos where these tints are present is not acceptable. Just because the colour is present in the photo doesn’t mean that it’s an important results.

To sort them and only keep relevant results, each photo is judged using the penalty calculation:
For every colour in the photo, we calculate its penalty. It depends on the distance with the matching searched tint and the importance of the colour in the photo.

penalty = (distance + 0.01) * (100 — coverage) * (100 — focus)

We only keep the minimum penalty that we get from all the colours in the photo and we attribute this penalty to the photo itself.

From there, we pick the 30 photos with the lowest minimum penalty and sort them by ascending penalty. Here’s a subset of these results.

Final, sorted set of photos returned from our search for “orange” photos. (Photos by Josh Rose, Jordan Bebek, Daniel Hansen, Ben Ostrower and Ricardo Gomez Angel)

You can see that these results are much more relevant for our search than before the sorting.

We could then limit these results to specific keywords (keyword search + colour search) to provide a precise solution to find photos in a library.


So there we go, we have a working setup that searches through photos by colour. There’s definitely space for improvement but I thought I’d share our thought process for that colour search MVP.

I cannot end this by anything else than congratulating and thanking you for making it through all of this. Bravo et merci!