Creating K-means Clustering from Scratch

Victor Sim
Hands-On Data Science
2 min readAug 22, 2020

--

Photo by Robin Pierre on Unsplash

K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features.

The Code:

Step 1| Generate Clusters:

def generate_clusters():
cluster_values = [np.random.choice(data) for _ in range(k)]
return cluster_values

This will randomly pick k number of points from the data as the start of the clusters.

Step 2| Assign Clusters:

def assign_clusters(cluster_values):
clusters = []
for i in range(k):
clusters.append([])
for point in data:
minimum_value = np.inf
index = 0
for cluster in cluster_values:
distance = abs(cluster-point)
if distance < minimum_value:
minimum_value = distance
index = cluster_values.index(cluster)
clusters[index].append(point)
return clusters

This function assigns each data point on the number line a cluster by finding which cluster is nearest to it. If you are applying this for 2 or 3 dimensional space, the formula in the function would not work, use Euclidean distance…

--

--

Victor Sim
Hands-On Data Science

Interested in Machine Learning. Open to internships and opportunities. Connect at https://linktr.ee/victorsi.