An interesting use case for the ML clustering algorithm

Johny Jose
Analytics Vidhya

--

The Clustering algorithm in machine learning is an unsupervised algorithm and is used for making sense out unstructured datasets. For instance, it is used for document classification, customer segmentation etc. Apart from this, clustering can also be used for detecting anomalies or outliers and for other basic preprocessing steps for supervised learning.

These are some common applications for clustering. Another interesting use case for this algorithm is to find the main colors or finding a color palette of an image. Let us understand how can that be done.

Finding color palette for images

Usually, images will have millions of colors but they follow a color palette or color scheme. So using clustering the similar colors (ordered by their pixel values, RGB) will be clustered together. After that its just a matter of finding the cluster center and applying that color value to the rest of the cluster.

I have used the K-Means clustering algorithm for doing the clustering. To process the image and find the cluster centroids, first, the image is loaded to a Numpy array and then the color values that range from 0–255 are flattened by dividing with 255.

im = cv2.imread("test_image.jpeg")
im_flat = np.divide(im, 255)

Now that we have the image loaded to a Numpy array with shape (3204, 4271, 3), we need to convert it to a lesser dimensional array by reshaping since we are bothered about the pixel color values and not about the position of the pixel. After reshaping the array becomes (13684284, 3).

X = im_flat.reshape(-1, 3)

Now running the clustering algorithm on the input array should get us the required cluster centers. The K-Means clustering API in Scikit-learn is used here.

K = 8

kmeans = KMeans(n_clusters=K, random_state=0, verbose=0, max_iter=5).fit(X)

The number of clusters is selected as 8 in this case. After running the algorithm the pixel values belonging to a cluster are assigned the value of the pixel that is in the cluster center.

X_recovered = kmeans.cluster_centers_[kmeans.labels_ , :].reshape(im.shape)X_recovered = X_recovered * 255
The resulting image is plotted side by side by the original to show the difference.

It's an interesting usage of clustering algorithm which I came across recently. It can be used to find the dominant colors or find the palette and use it to color other images.

The Jupyter notebook where I tried this out is available here.

Thank you.

--

--