Introducing K-means clustering in 200 words or less
K-Means clustering is the most popular unsupervised machine learning algorithm. It is used to find intrinsic groups within the unlabelled dataset and draw inferences from them.
The algorithm follows an easy or simple way to classify a given data set through a certain number of clusters, fixed apriori.
It alternates between two steps:
- Assigning each data point to the closest cluster center
- and then setting each cluster center as the mean of the data points that are assigned to it.
The algorithm is finished when the assignment of instances to clusters no longer changes.
K-Means Algorithm
We describe the algorithm with respect to the Euclidean distance function d(x,y) = ||x − y||.
The following is how you apply k-means with scikit-learn:
The Output:
The following plot describes the cluster centers found by k-means with three clusters:
import mglearnmglearn.discrete_scatter(X[:, 0], X[:, 1], kmeans.labels_, markers='o')mglearn.discrete_scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], [0, 1, 2],markers='^', markeredgewidth=2)
Read other articles about machine learning at: 8bitDS