Introducing K-means clustering in 200 words or less

Published in

8bitDS

2 min readJan 29, 2022

K-Means clustering is the most popular unsupervised machine learning algorithm. It is used to find intrinsic groups within the unlabelled dataset and draw inferences from them.

The algorithm follows an easy or simple way to classify a given data set through a certain number of clusters, fixed apriori.

It alternates between two steps:

Assigning each data point to the closest cluster center
and then setting each cluster center as the mean of the data points that are assigned to it.

The algorithm is finished when the assignment of instances to clusters no longer changes.

K-Means Algorithm

We describe the algorithm with respect to the Euclidean distance function d(x,y) = ||x − y||.

The following is how you apply k-means with scikit-learn:

The Output:

As n_clusters=3, the clusters are numbered 0 to 2.

The following plot describes the cluster centers found by k-means with three clusters:

import mglearnmglearn.discrete_scatter(X[:, 0], X[:, 1], kmeans.labels_, markers='o')mglearn.discrete_scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], [0, 1, 2],markers='^', markeredgewidth=2)

Read other articles about machine learning at: 8bitDS

Classification with XGBoost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable…

medium.com

Introducing K-means clustering in 200 words or less

K-Means Algorithm

Classification with XGBoost

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable…

Written by Vicky