Basics of K-Means Clustering algorithm

Mahesh Singh Dasila
2 min readJul 12, 2019

--

Data points being clustered

Here in this article we will go through the topics like-

  1. What is K-means clustering?
  2. why it is used?
  3. How does it work?
  4. Application of it?

So lets get started..

1 ) What is K-means clustering?

K-means clustering is one of the unsupervised machine learning algorithm that means it is used when we do not have any specified labels in the dataset.

2 ) why it is used?

The basic requirement of this algorithm is when we have unbalanced data and we have to make predictions by clustering the data points .The main aims to segregate or bifurcate groups with similar tasks or trait.

The purpose of this unsupervised machine learning algorithm is to choose clusters or rather groups ,in a given data set, with the number of groups indicated by the variable K. This works repeatedly, in order to assign each and every data point to one of the K cluster, based on the features that are given. The data points are generally grouped based on the feature similarity. The end results of the K-means clustering algorithm would be:

  1. Centroids of the number of clusters, which were identified (denoted as K).
  2. Labels for the training data.

3) How does it works?

  1. Categorizes the data into a number of groups as K (K is predefined).
  2. Choose K points arbitrarily, as centers of clusters.
  3. Distribute the points to their nearest cluster center, conforming to the Euclidean distance function.
  4. Compute the mean or centroid of the entire objects within each cluster.
  5. Iterate steps 2, 3 and 4 up until the matching points are allocated to each cluster in continuous rounds.

4) Application of it?

  1. Clustering Algorithm in Identifying Cancerous Data

— Clustering algorithm can be used in identifying the cancerous data set. It has been found through experiment that cancerous data set gives best results with unsupervised non linear clustering algorithms and hence we can conclude the non linear nature of the cancerous data set.

2. Clustering Algorithm in Academics

  • The ability to monitor the progress of students’ academic performance has been the critical issue for the academic community of higher learning.

3. Clustering Algorithm in Search Engines

Clustering algorithm is the backbone behind the search engines. Search engines try to group similar objects in one cluster and the dissimilar objects far from each other. It provides result for the searched data according to the nearest similar object which are clustered around the data to be searched. Better the clustering algorithm used, better are the chances of getting the required result on the front page.

Thanks for reading ,i hope you found it useful.

--

--