Creating K-means Clustering from Scratch

Published in

Hands-On Data Science

2 min readAug 22, 2020

K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features.

The Code:

Step 1| Generate Clusters:

def generate_clusters():
        cluster_values = [np.random.choice(data) for _ in range(k)]
        return cluster_values

This will randomly pick k number of points from the data as the start of the clusters.

Step 2| Assign Clusters:

def assign_clusters(cluster_values):
        clusters = []
        for i in range(k):
            clusters.append([])
        for point in data:
            minimum_value = np.inf
            index = 0
            for cluster in cluster_values:
                distance = abs(cluster-point)
                if distance < minimum_value:
                    minimum_value = distance
                    index = cluster_values.index(cluster)
            clusters[index].append(point)
        return clusters

This function assigns each data point on the number line a cluster by finding which cluster is nearest to it. If you are applying this for 2 or 3 dimensional space, the formula in the function would not work, use Euclidean distance…

Creating K-means Clustering from Scratch

The Code:

Step 1| Generate Clusters:

Step 2| Assign Clusters:

Written by Victor Sim