Simple Competitive Learning

I spent some time in the recent past learning and implementing competitive learning algorithms. To start off, in this post I’m writing about simple competitive learning.

Competitive Learning

Competitive learning (CL) algorithms are a class of unsupervised learning algorithms that are typically used to cluster high-dimensional data. Given a dataset, the goal of CL is to compute a set of cluster centers that can be a compact representation of the original data. In mathematical terms, given a set of data {x}, find a set of clusters centers {c} that satisfy the relationship :

i.e., the distance of the p th vector in {x} from its closest cluster center must be less than (or equal to) its distance from any of the other cluster centers.

Simple Cluster Center Update Algorithm

To start with, the following simple example with two-dimensional data is considered to go through the concepts relatively easily. To begin with, the cluster centers (or neurons) are initialized randomly. Each data vector in {x} is then evaluated against each cluster center. The cluster center that is closest to the (p th) vector being evaluated is termed the winning center (or Best Matching Unit-BMU). This neuron is said to be activated when it is BMU. Once the winning neuron is found, it is updated (moved closer) to the p th vector using the relationship:

It is important to note that only the winning center is updated. Here ϵ is the learning rate, a “hyper-parameter” that dictates the rate at which cluster centers get updated i.e. the distance by which they move in the two-dimensional output space. For this simple algorithm, I’m going to use a constant user-specified learning rate ϵ.

The distribution of the two-dimensional data is shown below. It is simply a set of randomly generated points in the range [-2.0,2.0] in both the x and y directions. Since this is an introductory example I will be using only 4 randomly selected points as cluster centers, so we expect to see 4 clusters as the output. The cluster centers are shown as black triangles in the figure. The initial positions of these centers are close to the origin.

Python Implementation

The programmatic implementation of this algorithm in python is straightforward and the code is posted here. With a learning rate ϵ=0.01 the algorithm was run 1000 times to update the cluster centers. Note that a convergence criterion was not specified in this case. The resulting cluster centers are shown below with the yellow triangle markers. It can be seen that each cluster now has a corresponding center. Data is colored by the ID of each winning neuron. You can see that all the data has segregated into 4 neat clusters. However, since there is no convergence criterion specified the winning centers are not at the geometric centers of each of the regions. The update equation (above) is influenced by the order of the points that are passed though it.

As an experiment, to see what other clusters the algorithm could come up with, I removed the pseudo-random number seed and let the algorithm initialize with completely different values on each run. There was one that was particularly interesting (below); to see the blue cluster align itself along the boundary of three major clusters.

I ran some more “standard” two-dimensional clustering scenarios using datasets from sklearn. Below are the results for the noisy two-moons, blobs and, two-circles datasets.

Final thoughts and improvements

This is a simple center update competitive learning algorithm, but it still produces very reasonable results. It is important to note that the center update equation is influenced more by the data that it has seen later as opposed to earlier, for example, c²¹ (the center vector at iteration 21) is influenced more by c¹⁹ than by, say, c⁵. This can be proved mathematically (Kirby). This is important because the order in which the data is presented to it can influence its final location especially with a fixed learning rate.

Another significant drawback of this algorithm is its inability to take into consideration regions of influence. Each cluster center has some proximity to other centers (in the output mapping space) and this is not considered in the update equation above. In other words, if the cluster centers are considered neurons, and only one of them is active for a given input vector. By considering a region of influence, its neighboring neurons can also be activated, albeit to a lower magnitude, and neurons far away from the winning neuron have a very low magnitude and hence a negligible influence.

These drawbacks are addressed by Self Organizing Feature Maps algorithm which I will be writing about in my next post on competitive learning.

References

  1. Geometric Data Analysis, Michael Kirby.
  2. Neural Networks & Learning Machines, Simon Haykin.