Image Color Extraction using K-Means Clustering

Everything About Color Extraction

Yi Shen
6 min readMay 23, 2022

Introduction

This article explains basic methods for image color extraction and visualization using Python.

Vector vs Raster

There are two general types of graphs: raster and vector. Vector graphs are based on geometry shapes, such as points, curves, polygons, etc. For vector data are drawn by clear geometric formulas, they convey information correctively and accurately. If you zoom in on a vector image on your device, the boundaries are always clear.

Raster graphs can be recognized as array-like data made of pixels, the smallest unit of a dot containing color information. The number of pixels in an image is referred to as resolution. For the same image, high resolution(more pixels) usually means higher graphic quality and more details. Most graphic analysis research is raster-based.

A Photo of Hells Kitchen at Different Resolutions

Color Spaces

Every single pixel in a raster image is given a color. Images can be perceived as datasets of colors. All we have to do is to define colors with numbers. For example, the colormap is one method to present one-dimensional color information. Each point from one end to the other represents one specific color, while palettes and color swatches can be perceived as a two-dimensional color index.

Matplotlib Colormaps(left); Photoshop Color Swatch(right)

While 1d and 2d are easy to understand, 3d color space is broadly applicable in practice. Typical 3d color spaces include RGB, HSV, Lab, etc. RGB is the default color space for Machine Learning among all color spaces. Photoshop’s Color Picker is a perfect illustration for colors and codes for different color spaces.

Color Botticelli(left) in RGB Color Space(RGB 145, 179, 188; HEX #91B3BC)

The following two articles I found on Medium explain the definition and application of different color spaces.

Extracting Colors From an Image

Now let’s try what we can do with the color knowledge. Here I used a photo I posted on WeChat Moments in March 2022. The image is understood by the computer as a 1080*1440*3 array.

Original Image(1080x1440 jpeg)

I then split RGB color channels with the function in cv2. For each pixel conveying [r,g,b] data, this function changes the data into [r, r, r], [g, g, g], and [b, b, b] respectively.

Therefore, the photo is deconstructed into pixels with 3d RGB data points, thus can be visualized in a 3-d axis. See below:

RGB Color Space Visualization

K-Means Clustering

How do we extract a few dominant colors from this large dataset? One intuition is to classify them into clusters and find the geometry center of each cluster. This can be achieved via Machine Learning methods.

The principle of clustering is to partition data into groups, each data point is assigned to a group. Data points in one group share similarities. In our practice, similarities are recognized as simple as Euclidean distance(visually reasonable).

I used K-Means clustering for this case. In short, K-Means clustering is a dynamic process of finding the centroid, and other points are classified according to their distance from the centroid. Theoretically, other clustering methods(e.g. DBSCAN, agglomerative) can also be applied. But K-means clustering is a nondeterministic algorithm, meaning the outputs could be adjusted (by changing the number of cluster K) until we find the results consistent with our intuition.

Extracted Colors(left); RGB Color Space Visualization, Colored by Cluster Centroid(right)
Original Image vs Extracted Colors (k=8)

Below is another example of color extraction.

Chinatown(纽约华埠), Manhattan, NY

The Number of Colors To Extract

How many colors should we extract? Typically, data scientists use the Silhouette or Elbow test to find the optimal K for K-means clustering, which is not necessary for our practice. For example, when applying the Elbow test, the optimal K is 2(the location of the bend), while 2 is clearly not helpful.

Elbow Test Result(left); Extracted Colors when K=2(right)

The diagram below shows results from different K values(2, 4, 8, 12, 16) applied to the model. The colored clusters can help us decide K-value(in layman’s terms, how many colors we should extract). When K-value is larger than eight, the differences in these clusters are no longer visually apparent. Also, the pie charts are showing some visually-identical colors.

Applying the Model with Different K values

Key Takeaways

  • Perceive images as mathematical data
  • Understand why RGB color space can be used in Machine Learning
  • K-Means Clustering
  • How to decide on K-value

The next coming article will be about color calculation and thresholds.

References

--

--

Yi Shen

MSUP@Columbia | Revit Plugin Developer | UE5 Project Maker | Program Developer