Let’s clearly explain: KNN

5 min readApr 2, 2023

In this series I’m about to explain the most common Machine Learning algorithms, starting from the basics, leading to much more advanced ones.

The first candidate of the series is the well-known K-Nearest Neighbors (or KNN for short) model. This model is used to classify new datapoints based on the distances to other datapoints. This model is supervised, since we have a lot of data with known labels, which we can feed into our model to train it, then get predicted labels of unlabeled data.

Data

We need a training dataset first. It can has any number of dimensions, but the model best performs on a few (2–5) number of features, because the distances tend to be normally distributed on high dimensions. (This means, distances are nearly the same, varying by some noise.) Also the number of datapoints can be any size, but because this model is non-parametric and stores the whole dataset in memory for prediction (more on that later), it can be computationally exhaustive on large datasets.

In this story I’m about to use the Iris dataset reduced to 2 dimensions (width*height; for good visualization) with 150 datapoints, consisting 3 labels. (Plotted above)

Model

The KNN model is a very simple model, we only have to feed it with a dataset with labels, then input an unlabeled…

Let’s clearly explain: KNN

Data

Model

Written by Tamas Morucz