KNN A simple explanation

2 min readJul 9, 2023

KNN as an algorithm is inspired by real life. People tend to be affected by the people around them. Our friends guide our behavior as we grew up. Our parents also shape our personality in some ways if you grow up with people who love sports, it is highly likely that you will end up loving sports. Knn also works in a similar fashion.

Introduction

K-nearest neighbors is a supervised machine learning algorithm, used for both classification and regression task. Knn is a non parametric algorithm, meaning it does not make any underlying assumptions about the distribution of the data.

The basic idea behind knn is to find k closest data points in the training set to a given query point in the feature space. the distance metric used is euclidean distance, although other distance metrics can also be employed.

KNN is also known as lazy learner since it defers the learning process until the time of prediction. It memorizes the training instances and uses them during inference, making the training phase computationally inexpensive but prediction phase potentially slower.

How does knn works ?

step-1: Select the number of k values.
step-2: Calculate the distance of k number of neighbors.
step-3: Take the k nearest neighbors as per the euclidean distance.
step-4: Among these k neighbors, count the number of the data points in each category.
step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the below image.

How to select the value of K in the K-NN Algorithm?

There is no particular way to determine the best value for “K”, so we need to try some values to find the best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.
Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:

It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data points for all the training samples.