Image Color Extraction using K-Means Clustering

Everything About Color Extraction

6 min readMay 23, 2022

Introduction

This article explains basic methods for image color extraction and visualization using Python.

Vector vs Raster

There are two general types of graphs: raster and vector. Vector graphs are based on geometry shapes, such as points, curves, polygons, etc. For vector data are drawn by clear geometric formulas, they convey information correctively and accurately. If you zoom in on a vector image on your device, the boundaries are always clear.

Raster graphs can be recognized as array-like data made of pixels, the smallest unit of a dot containing color information. The number of pixels in an image is referred to as resolution. For the same image, high resolution(more pixels) usually means higher graphic quality and more details. Most graphic analysis research is raster-based.

A Photo of Hells Kitchen at Different Resolutions

Color Spaces

Every single pixel in a raster image is given a color. Images can be perceived as datasets of colors. All we have to do is to define colors with numbers. For example, the colormap is one method to present one-dimensional color information. Each point from one end to the other represents one specific color, while palettes and color swatches can be perceived as a two-dimensional color index.

Matplotlib Colormaps(left); Photoshop Color Swatch(right)

While 1d and 2d are easy to understand, 3d color space is broadly applicable in practice. Typical 3d color spaces include RGB, HSV, Lab, etc. RGB is the default color space for Machine Learning among all color spaces. Photoshop’s Color Picker is a perfect illustration for colors and codes for different color spaces.

Color **Botticelli(**left) in RGB Color Space(RGB 145, 179, 188; HEX #91B3BC)

The following two articles I found on Medium explain the definition and application of different color spaces.

Understand and Visualize Color Spaces to Improve Your Machine Learning and Deep Learning Models

Explain, analyze and experiment 14 popular color spaces and their consequences on the accuracy of our models.

towardsdatascience.com

OpenCV: Different Color Spaces in Image Processing with Python

Preprocessing techniques for feature extraction and object detection

pub.towardsai.net

Extracting Colors From an Image

Now let’s try what we can do with the color knowledge. Here I used a photo I posted on WeChat Moments in March 2022. The image is understood by the computer as a 1080*1440*3 array.

I then split RGB color channels with the function in cv2. For each pixel conveying [r,g,b] data, this function changes the data into [r, r, r], [g, g, g], and [b, b, b] respectively.

Therefore, the photo is deconstructed into pixels with 3d RGB data points, thus can be visualized in a 3-d axis. See below:

K-Means Clustering

How do we extract a few dominant colors from this large dataset? One intuition is to classify them into clusters and find the geometry center of each cluster. This can be achieved via Machine Learning methods.

The principle of clustering is to partition data into groups, each data point is assigned to a group. Data points in one group share similarities. In our practice, similarities are recognized as simple as Euclidean distance(visually reasonable).

I used K-Means clustering for this case. In short, K-Means clustering is a dynamic process of finding the centroid, and other points are classified according to their distance from the centroid. Theoretically, other clustering methods(e.g. DBSCAN, agglomerative) can also be applied. But K-means clustering is a nondeterministic algorithm, meaning the outputs could be adjusted (by changing the number of cluster K) until we find the results consistent with our intuition.

Extracted Colors(left); RGB Color Space Visualization, Colored by Cluster Centroid(right)

Original Image vs Extracted Colors (k=8)

Below is another example of color extraction.

The Number of Colors To Extract

How many colors should we extract? Typically, data scientists use the Silhouette or Elbow test to find the optimal K for K-means clustering, which is not necessary for our practice. For example, when applying the Elbow test, the optimal K is 2(the location of the bend), while 2 is clearly not helpful.

Elbow Test Result(left); Extracted Colors when K=2(right)

The diagram below shows results from different K values(2, 4, 8, 12, 16) applied to the model. The colored clusters can help us decide K-value(in layman’s terms, how many colors we should extract). When K-value is larger than eight, the differences in these clusters are no longer visually apparent. Also, the pie charts are showing some visually-identical colors.

Applying the Model with Different K values

Key Takeaways

Perceive images as mathematical data
Understand why RGB color space can be used in Machine Learning
K-Means Clustering
How to decide on K-value

The next coming article will be about color calculation and thresholds.

References

3D Visualization of K-means Clustering

In the previous post, I explained how to choose the optimal K value for K-Means Clustering. Since the main purpose of…

medium.com

Color palette extraction with K-means clustering | Machine Learning from Scratch (Part IV)

Find dominant colors in mobile UI screenshots using K-Means clustering in Python

towardsdatascience.com

Skin Segmentation and Dominant Tone/Color Extraction

Hello World!!! Have you ever looked at your skin and wondered why there are different shades in different parts of the…

medium.datadriveninvestor.com

OpenCV: Different Color Spaces in Image Processing with Python

Preprocessing techniques for feature extraction and object detection

pub.towardsai.net

Understand and Visualize Color Spaces to Improve Your Machine Learning and Deep Learning Models

Explain, analyze and experiment 14 popular color spaces and their consequences on the accuracy of our models.

towardsdatascience.com

Image Color Extraction using K-Means Clustering

Everything About Color Extraction

Introduction

Vector vs Raster

Color Spaces

Understand and Visualize Color Spaces to Improve Your Machine Learning and Deep Learning Models

Explain, analyze and experiment 14 popular color spaces and their consequences on the accuracy of our models.

OpenCV: Different Color Spaces in Image Processing with Python

Preprocessing techniques for feature extraction and object detection

Extracting Colors From an Image

K-Means Clustering

The Number of Colors To Extract

Key Takeaways

References

3D Visualization of K-means Clustering

In the previous post, I explained how to choose the optimal K value for K-Means Clustering. Since the main purpose of…

Color palette extraction with K-means clustering | Machine Learning from Scratch (Part IV)

Find dominant colors in mobile UI screenshots using K-Means clustering in Python

Skin Segmentation and Dominant Tone/Color Extraction

Hello World!!! Have you ever looked at your skin and wondered why there are different shades in different parts of the…

OpenCV: Different Color Spaces in Image Processing with Python

Preprocessing techniques for feature extraction and object detection

Understand and Visualize Color Spaces to Improve Your Machine Learning and Deep Learning Models

Explain, analyze and experiment 14 popular color spaces and their consequences on the accuracy of our models.

Written by Yi Shen