Dominant colors in an image using k-means clustering

Shivam Thakkar
BuzzRobot
Published in
3 min readJan 26, 2018

Recently I was wondering that, is it possible to detect dominant colors in an image. After going through a series of web snippets and code playing I was able to achieve excellent results using the k-means clustering algorithm. I have implemented it using python OpenCV and scikit-learn. You can fork it from GitHub

How it works?

Basically, k-means is a clustering algorithm used in Machine Learning where a set of data points are to be categorized to ‘k’ groups. It works on simple distance calculation.

  1. At random select ‘k’ points not necessarily from the dataset.
  2. Assign each data point to closest cluster.
  3. Compute and place the new centroid of each cluster.
  4. Reassign the data points to the new closest cluster. If any reassignment took place go to step 3 else the model is ready.

We are going to use powerful ML library scikit-learn for k-means, while you can code it from scratch by referring to this tutorial.

Applying to images

As an image is made of three channels: Red, Green and Blue we can think of each pixel as a point (x=Red, y=Green, z=Blue) in 3D space and so can apply k-means clustering algorithm on the same.

After processing each pixel with the algorithm cluster centroids would be the required dominant colors.

We are going to use this image (dimension - 100 x 100)

colors.jpg

plotting points in 3D space using python matplotlib

3D plot of “colors.jpg” using x=red, y=green, z=blue

From the plot one can easily see that the data points are forming groups - some places in a graph are more dense, which we can think as different colors’ dominance on the image. We will try to achieve these clusters through k-means clustering.

clusters in plot

The Source Code

We took “clusters = 5” (k=5) which means we will get 5 clusters and therefore 5 dominant colors for the image.

Output

The output would be in the form of k x 3 array where each array element represents RGB values of the dominant color.

Extras

What more can we do here? You might have got some idea about how it works but to get a more clear idea we can try adding more to our visualization part. Till now we have visualized the color points in 3D space. This few lines of code will show you which cluster each color point belongs to (visualizing the clustered data).

With these two methods added to our original “DominantColors” class and importing the modules mentioned we will be able to visualize the whole thing by just calling the “plotClusters()” method just after the original code.

cluster visualization

One more add-on, we can also display the order of dominance. I mean which color is most effective (the clusters with most data points) followed by the lesser ones by using histogram and cv2.rectangle() method in an innovative way.

Now follow the same steps as we did for “plotClusters()” and call “plotHistogram()” method. Yes, we are done!

dominance order

References:

https://www.pyimagesearch.com/2014/05/26/opencv-python-k-means-color-clustering/

https://zeevgilovitz.com/detecting-dominant-colours-in-python

--

--

Shivam Thakkar
BuzzRobot

Founder @ datasciencewizards.ai | MLOps | System Design | AI/ML/DL | Curiosity-driven deep-thinker exploring beyond the surface