KNN Algorithm

4 min readDec 26, 2023

What is KNN Algorithm (K-Nearest Neighbour)?

· K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique.

· K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.

· K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.

· K-NN is a hyper-parametric algorithm, which means it does not make any assumption on underlying data.

· It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.

· KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram:

How KNN Word?

While KNN can be used for either regression or classification problems, it is typically used as a classification algorithm.

Classification

· For classification problems, a class label is assigned on the basis of a majority vote — i.e. the label that is most frequently represented around a given data point is used.

· While this is technically considered “plurality voting”, the term, “majority vote” is more commonly used in literature.

· The distinction between these terminologies is that “majority voting” technically requires a majority of greater than 50%, which primarily works when there are only two categories.

· When you have multiple classes — e.g. four categories, you don’t necessarily need 50% of the vote to make a conclusion about a class; you could assign a class label with a vote of greater than 25%.

Regression

· Regression problems use a similar concept as classification problem, but in this case, the average the k nearest neighbour’s is taken to make a prediction about a classification.

· The main distinction here is that classification is used for discrete values, whereas regression is used with continuous ones.

· However, before a classification can be made, the distance must be defined. Euclidean distance is most commonly used.

K-NN Classification Algorithm

Step 1 — Select a K value (Odd No.).

Step 2 — Take K number of Nearest Neighbours based on their distance.

Step 3 — Distance can be find out by Euclidean Distance, Manhattan Distance or Cosine Distance.

Step 4 — Among these k neighbours, count the number of the data points in each category.

Step 5 — Assign the new data points to that category for which the number of the neighbour is maximum.

Example of KNN

Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So, for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cat’s and dog’s images and based on the most similar features it will put it in either cat or dog category.

Applications of the KNN Algorithm

KNN is not confined to residential classification. It has a wide range of applications across various domains.

· Image Recognition — In computer vision, KNN can be used to recognize images based on similarities with existing labeled images.

· Recommendation Systems — E-commerce platforms leverage KNN to suggest products by analyzing the preferences of users with similar tastes.

· Anomaly Detection — KNN can be applied to detect anomalies or outliers in a dataset, making it valuable in fraud detection or quality control.

· Medical Diagnosis — Analyzing patient data and identifying similar medical cases aids in diagnosing diseases using KNN.

Pros and Cons

Like any algorithm, KNN has its strengths and weaknesses.

Pros:

· Simple to Implement — KNN is easy to understand and implement, making it a great choice for beginners.

· No Training Period — Unlike many machine learning algorithms, KNN doesn’t require a lengthy training period. It works as soon as the data is available.

· Versatility — Suitable for both classification and regression tasks, KNN is versatile and can adapt to various scenarios.

Cons:

· Computational Cost — The algorithm can be computationally expensive, especially with large datasets, as it needs to compute distances between the data points.

· Sensitivity to Outliers — Outliers can significantly impact KNN’s performance, as the neighbour’s calculation is influenced by extreme values.

· Choosing the Right ‘k’ — Selecting the appropriate value for ‘k’ is crucial. Too small a value can lead to overfitting, while too large a value can result in oversimplification.

Conclusion

In conclusion, K-Nearest Neighbour’s is a valuable tool in the machine learning toolbox. From its simple yet intuitive approach to its diverse applications, KNN has proven its worth in various fields. However, understanding its limitations and being mindful of parameter tuning is essential for successful implementation.