ON Coding K-Means in Vanilla Python

Published in

Hacking Analytics

4 min readNov 27, 2018

K-means is an algorithm to assign clusters to different points. One of its’ application is to provide an automated segmentation and micro-segmentation method based on behavioral data. These segments can then used to provide recommendation or targeted offers to the different users’ within the segments such as previously described in a previous blog post.

The algorithm takes two main inputs, a list of points and a fixed number of clusters and provides two different outputs, a cluster assignment for each point, a cluster center for each of the computed clusters. This blog post shows how to implement this algorithm from scratch using vanilla python.

Sklearn API

If we look at the Sklearn API as how it sets up the inputs for K-means:

K-Means is instantiated by providing the input number of clusters to compute, and the training is being computed when calling the fit method on a list of points, numpy array or pandas data-frame.

ON Coding K-Means in Vanilla Python

Sklearn API

Written by Julien Kervizic