Introduction to K-means Clustering

K-Means is applied in

  • Customer Profiling:
  • market segmentation,
  • computer vision
  • Geo-statistics
  • Astronomy

Algorithm

Data points being clustered
K-Means at work

How to select the best K…

var SSE = {};
for (var i = 1; k <= MaxK; ++k) {
SSE[k] = 0;
clusters = kmeans(dataset, k);
clusters.forEach(function(cluster) {
mean = clusterMean(cluster);
cluster.forEach(function(datapoint) {
SSE[k] += Math.pow(datapoint - mean, 2);
});
});
}

Let’s run K-Means

from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.cluster import KMeansfrom sklearn.metrics import adjusted_rand_score
documents = ["Cricket is one of the most famous sport in the world","Football is the most famous sport","Rugby is a popular sport among the young ","Lionel Messi is an icon in the world of football","Virat Kohli is one of the best batsmen to play the game","Sporting makes you live healthy"]
vectorizer = TfidfVectorizer(stop_words='english')X = vectorizer.fit_transform(documents)
no_of_clusters = 3model = KMeans(n_clusters=no_of_clusters, init='k-means++', max_iter=100, n_init=1)model.fit(X)
order_centroids = model.cluster_centers_.argsort()[:, ::-1]terms = vectorizer.get_feature_names()for a in range(true_k): print("Cluster %d:" % a),for ind in order_centroids[a, :10]: print(' %s' % terms[ind]),
Y = vectorizer.transform(["Steve Smith is one of the best captains in Cricket"])prediction = model.predict(Y)print(prediction)Y = vectorizer.transform(["England does very well in both Cricket and Football"])prediction = model.predict(Y)print(prediction)
print(prediction)Y = vectorizer.transform(["England does very well in both Cricket and Football"])prediction = model.predict(Y)print(prediction)

--

--

--

I am a Technical Lead with 6X AWS Certifications. I am passionate about micro-services, machine-learning , data analytics and algorithms

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to pick parts for a Deep learning PC when on a budget?

Combining Satellite Imagery and machine learning to predict poverty

Brand Voice: Deep Learning for Speech Synthesis

Snap ML: Asset Library templates

Review notes of ML PLatforms — Uber Michelangelo

Mueller Report for Nerds! Spark meets NLP with TensorFlow and BERT (Part 1)

A Quick Deep Learning Recipe: Time Series Forecasting with Keras in Python

Deep Learning — Overfitting

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dileka Madushan

Dileka Madushan

I am a Technical Lead with 6X AWS Certifications. I am passionate about micro-services, machine-learning , data analytics and algorithms

More from Medium

Solving a machine learning problem never means jumping right of from the data and starting…

Evaluation of Classification Model

Garbage Classification

Randomized Optimization Algorithm Comparison