ML From Scratch Part 01 β€”
K-Nearest Neighbor (KNN)

Introduction

KNN is a supervised machine-learning algorithm used for simple classification tasks.

Algorithm

The algorithm is simple and intuitive, when predicting it calculates the distance between each of the existing data points, and classifies it, the same as the nearest class to it.

Implementation

import numpy as np
from collections import Counter

class KNN:
def __init__(self, k=3):
self.k = k

def fit(self, X, y):
self.X_train = X
self.y_train = y

def predict(self, X):
return np.array([self._predict(x) for x in X])

def _predict(self, x):
# compute distances
distances = [
self._euclidean_distance(x, x_train) for x_train in self.X_train
]

# get k nearest neighbors/samples , labels
k_indices = np.argsort(distances)[:self.k]
k_nearest_labels = [self.y_train[i] for i in k_indices]

# majority vote [most common class label]
most_common = Counter(k_nearest_labels).most_common(1)
return most_common[0][0]

def _euclidean_distance(self, x1, x2):
return np.sqrt(np.sum((x1-x2)**2))

How to use it?

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

knn = KNN(k=5)
knn.fit(X_train, y_train)
preds = knn.predict(X_test)

accuracy = np.sum(preds == y_test) / len(y_test)
print(accuracy)

GitHub Repo

Conclusion

Though we built this class, no one will be coding the entire algorithm every time we use it, this was just for educational purposes.

When working with real-world projects, you can use packages such as scikit-learn, below is the code for KNN with scikit-learn API in Python.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y
)

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

preds = model.predict(X_test)
accuracy = sum(preds == y_test) / len(preds)

print(accuracy)

print(classification_report(y_test, preds))

Have a good day, πŸ˜πŸ‘‹

--

--