X-Means — A Complement to the K-Means Clustering Algorithm

Published in

Geek Culture

4 min readMay 27, 2021

Introduction

Clustering is a process to group data into several clusters or groups so the data in one cluster has a maximum level of similarity and data between clusters has a minimum similarity. K-means (Duda & Hart, 1973; Bishop, 1995) has long been the workhorse for metric data. Its attractiveness lies in its simplicity, and in it’s local-minimum convergence properties. The center of the cluster or centroid is the starting point for the group in clusters in the K-Means algorithm. The data is done by calculating the closest distance to the initial cluster center point as a central point in the formation of each group or cluster.

Weakness of K-Means and the Need for X-Means

But in this situation this determination of the initial cluster center point is the weakness of the K-Means algorithm. This is because there is no approach used in selecting and determining the cluster center point. The cluster center point is chosen arbitrarily or randomly from a set of data. The clustering results of the K-Means algorithm are often not optimal and not optimal in every conducted experiment. Therefore, it can be said that the good and bad results of clustering depend on the center point of the cluster or the initial centroid.

Holistically, K-means suffers from the following limitations:

K-means is slow and it scales poorly with respect to the time it takes to complete each iteration.
The number of clusters ‘K’ have to be pre-determined and supplied by the user.
When confined to run with a fixed value of K, it empirically finds worse local optima than when it can dynamically alter K.

The solution for the first two problems and a partial remedy for the third is X-means. This method of efficient estimation of the number of clusters was proposed by Dan Pelleg and Andrew Moore from the Carnegie Mellon University, Pittsburgh, USA.

Explanation of X-Means

X-means goes into action after each run of K-means, making local decisions about which subset of the current centroids should split themselves to attain a better fit. The splitting decision is done by computing the Bayesian Information Criterion (BIC).

X-Means works by alternatively applying two operations — The K-Means algorithm (Improve-params) to optimally detect the clusters for a chosen value of k, and cluster splitting(Improve-structure) to optimize the value of k according to Information Criterion. In this method, the actual value of K is estimated in a way that is not monitored and only based on the data set. Kmax and Kmin as the upper and lower limits for the possible values of X. In the first step of the X-means grouping, know that at present X = Xmin, X-means finding the initial structure and centroid. In the next step, each cluster in the estimated structure is treated as the parent cluster, which can be divided into two groups.

Implementation in Python

A robust and amazing library exists to implement X-means to determine the optimal number of clusters — pyclustering.cluster.xmeans

The PyClustering library implements the K-Means++ initialization algorithm which is known to choose initial centers within a known bound of the optimal center location. The documentation for PyClustering shows how to call K++ initialization in that package. K++ initialization (and its more scalable counterpart K||) stochastically pick initial points far from each other. They pick the initial object uniformly at random, and then select each next point with probability proportional to its distance from already chosen ones.

While implementing, we need to give it an initial list of centers. We can select initial centers as one of the following:

Two random objects.
The two farthest objects.
A random object and it’s farthest neighbor.
The first and last objects along the first eigenvector of PCA.

Scope for Improvement

One of the major problems with X-means is that it assumes an identical spherical Gaussian of the data. Because of this, it tends to over-fit data in elliptical clusters , or in an input set with data of varying cluster size. The G-Means and PG-Means algorithms try to solve this problem by projecting the data onto one dimension, and running a statistical goodness-of-fit test. This approach leads to better performance for non-spherical distributions, however, projections may not work optimally for all data sets. A projection can collapse the data from many clusters together, neglecting the difference in density. This requires multiple projections for accuracy.

I hope this article gives you an insight into the X-Means algorithm, which serves as an enhancement to the K-Means algorithm.

Thanks for Reading! Stay Safe!