Types of Clustering — Definitions, Formations and Limitations!!
This article gives you a high-level understanding of different clustering techniques and their formation
What is Cluster Analysis?
Grouping the data objects based on the information found in the data that describes the objects and their relationships. The goal of clustering is creating groups such that the objects within a group be like one another and different from the objects in other groups. The greater the similarity within a group and the greater the difference between groups, the better the clustering quality.
Different types of clustering techniques:
Well separated clusters:
· Distance between any two points in different groups is larger than the distance between any two points in the same group.
· These clusters need not be globular but, can have any shape.
· Sometimes a threshold is used to specify that all the objects in a cluster must sufficiently close to one another. Definition of a cluster is satisfied only when the data contains natural clusters.
Prototype Based Cluster:
· If the data is numerical, the prototype of the cluster is often a centroid i.e., the average of all the points in the cluster.
· If the data has categorical attributes, the prototype of the cluster is often a medoid i.e., the most representative point of the cluster.
· Objects in the cluster are closer to the prototype of the cluster than to the prototype of any other cluster.
· Prototype based clusters can also be referred to as “Center-Based” Clusters.
· These clusters tend to be globular.
· K-Means and K-Medoids are the examples of Prototype Based Clustering algorithms
Graph Based Clusters (Contiguity — Based Clusters)
· Two objects are connected only if they are within a specified distance of each other.
· Each point in a cluster is closer to at least one point in the same cluster than to any point in a different cluster.
· Useful when clusters are irregular and intertwined.
· This does not work efficiently when there is noise in the data, as shown in the above picture, a small bridge of points can merge two distinct clusters into one.
· Clique is another type of Graph Based Cluster (Detailly explained in my future articles).
· Agglomerative hierarchical clustering has close relation with Graph based clustering technique.
Density Based Clusters:
· Cluster is a dense region of objects that is surrounded by a region of low density.
· Density based clusters are employed when the clusters are irregular, intertwined and when noise and outliers are present.
· Points in low density region are classified as noise and omitted. The above picture can be compared with the picture under “Graph Based clustering” for better understanding. The bridge between two circles and another small curve are eliminated.
· DBSCAN is an example of Density based clustering algorithm.
The above-mentioned techniques are the foundation to understand the clusters formation in different ways. For better knowledge on clustering, what is to be learnt further?
- Hierarchical vs Partitional clustering. Exclusive, Overlapping and Fuzzy Clustering.
- Different cluster algorithms such as K-Means, DBSCAN, Fuzzy Clustering, SOM (Self Organizing — Maps) and EM (Expectation Maximization).
- Cluster quality measures. (Intra cluster quality and Inter cluster quality)
- Strengths and limitations
source: Introduction to Data Mining (by Pearson Education)