ON Coding K-Means in Vanilla Python

Julien Kervizic
Hacking Analytics
Published in
4 min readNov 27, 2018

--

K-means is an algorithm to assign clusters to different points. One of its’ application is to provide an automated segmentation and micro-segmentation method based on behavioral data. These segments can then used to provide recommendation or targeted offers to the different users’ within the segments such as previously described in a previous blog post.

The algorithm takes two main inputs, a list of points and a fixed number of clusters and provides two different outputs, a cluster assignment for each point, a cluster center for each of the computed clusters. This blog post shows how to implement this algorithm from scratch using vanilla python.

Sklearn API

If we look at the Sklearn API as how it sets up the inputs for K-means:

K-Means is instantiated by providing the input number of clusters to compute, and the training is being computed when calling the fit method on a list of points, numpy array or pandas data-frame.

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com