Let’s clearly explain: KNN
In this series I’m about to explain the most common Machine Learning algorithms, starting from the basics, leading to much more advanced ones.
The first candidate of the series is the well-known K-Nearest Neighbors (or KNN for short) model. This model is used to classify new datapoints based on the distances to other datapoints. This model is supervised, since we have a lot of data with known labels, which we can feed into our model to train it, then get predicted labels of unlabeled data.
Data
We need a training dataset first. It can has any number of dimensions, but the model best performs on a few (2–5) number of features, because the distances tend to be normally distributed on high dimensions. (This means, distances are nearly the same, varying by some noise.) Also the number of datapoints can be any size, but because this model is non-parametric and stores the whole dataset in memory for prediction (more on that later), it can be computationally exhaustive on large datasets.
In this story I’m about to use the Iris dataset reduced to 2 dimensions (width*height; for good visualization) with 150 datapoints, consisting 3 labels. (Plotted above)
Model
The KNN model is a very simple model, we only have to feed it with a dataset with labels, then input an unlabeled…