ML From Scratch Part 01 —
K-Nearest Neighbor (KNN)

Rohit Krishna

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

2 min readJul 27, 2023

Introduction

KNN is a supervised machine-learning algorithm used for simple classification tasks.

Algorithm

The algorithm is simple and intuitive, when predicting it calculates the distance between each of the existing data points, and classifies it, the same as the nearest class to it.

Implementation

import numpy as np
from collections import Counter

class KNN:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        return np.array([self._predict(x) for x in X])

    def _predict(self, x):
        # compute distances
        distances = [
            self._euclidean_distance(x, x_train) for x_train in self.X_train
        ]

        # get k nearest neighbors/samples , labels
        k_indices = np.argsort(distances)[:self.k]
        k_nearest_labels = [self.y_train[i] for i in k_indices]

        # majority vote [most common class label]
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]

    def _euclidean_distance(self, x1, x2):
        return np.sqrt(np.sum((x1-x2)**2))

How to use it?

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

knn = KNN(k=5)
knn.fit(X_train, y_train)
preds = knn.predict(X_test)

accuracy = np.sum(preds == y_test) / len(y_test)
print(accuracy)

GitHub Repo

GitHub - rohit-krish/MLFromScratch-Medium-Article-Series

Contribute to rohit-krish/MLFromScratch-Medium-Article-Series development by creating an account on GitHub.

github.com

Conclusion

Though we built this class, no one will be coding the entire algorithm every time we use it, this was just for educational purposes.

When working with real-world projects, you can use packages such as scikit-learn, below is the code for KNN with scikit-learn API in Python.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y
)

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

preds = model.predict(X_test)
accuracy = sum(preds == y_test) / len(preds)

print(accuracy)

print(classification_report(y_test, preds))