Creating K-means Clustering from Scratch
Published in
2 min readAug 22, 2020
K-means clustering is by far the most common unsupervised machine learning algorithm. They are very useful in giving insight into grouping (clustering) data together to find common features.
The Code:
Step 1| Generate Clusters:
def generate_clusters():
cluster_values = [np.random.choice(data) for _ in range(k)]
return cluster_values
This will randomly pick k number of points from the data as the start of the clusters.
Step 2| Assign Clusters:
def assign_clusters(cluster_values):
clusters = []
for i in range(k):
clusters.append([])
for point in data:
minimum_value = np.inf
index = 0
for cluster in cluster_values:
distance = abs(cluster-point)
if distance < minimum_value:
minimum_value = distance
index = cluster_values.index(cluster)
clusters[index].append(point)
return clusters
This function assigns each data point on the number line a cluster by finding which cluster is nearest to it. If you are applying this for 2 or 3 dimensional space, the formula in the function would not work, use Euclidean distance…