k-NN — Love thy neighbor

6 min readJan 9, 2023

This probably might be one of my shortest blogs ever. Something so short might not seem credible, or it does feel like some information is missing. But, trust me, by the end of this blog you shall understand k-NN and the theory behind the model.

k-Nearest Neighbors

Also known as kNN is a widely used model in the world of machine learning. Mainly used as a supervised technique. What is supervised? If you are not familiar with supervised, it is when there is a target variable for your model to map a relationship between X — your features and y — your output. So basically our model is just a bunch of rows and columns, with a target or output column. In other words, your model just features, and targets

In such cases, grouping them together can help find hidden patterns in the data, and these hidden patterns have much value to them. From the hidden patterns, you can separate the data into their respective classes and do analysis and all that jazz.

Say for instance I have a bunch of data scattered, and I have already grouped them.

Now, take a moment and analyze Fig 1. and try to answer this question. If I was to put in a new instance, or a new data point, how am I going to group it?

If you have answered something along the lines of, checking the closest point beside the new instance or checking the “neighbor” of the new instances, then you are absolutely right. When there is a new instance, kNN checks the neighbors of the instance on the plane, and whatever neighbor group is the majority, the new instance falls in that group or category.

But how does the model know how many neighbors to check? This is done by assigning a value to our model, and this is what k in our kNN refers to. if I was to take the value of k=1, then kNN becomes 1-nearest neighbor as seen in Fig 3.

If we assign our k value to be k=2, then our kNN becomes, 2-nearest neighbors as seen in Fig 4. below

Following the trend above, I take k=6 then, my kNN becomes 6-nearest neighbors as seen in Fig 5. below

How does the prediction happen after assigning my k value, and telling my kNN how many neighbors to check? Well, this happens via majority voting.

For any new instance and given k value, the model does the prediction by taking the majority of the neighbor's categories/classes/groups.

Does this just happen by itself? No, as you know, machine learning is built upon statistics and a lot of math, the neighbors are sorted by a distance measure, generally, euclidean distance, and the first k points are selected.

Now it becomes more intuitive, given a new instance, the kNN model takes a value for k as seen in the figures above, and checks the number of neighbors as per the value of k. In measuring the distance between neighbors, we can identify the majority class closest to the new instance and assign the class of the new record. As simple as that!

If you followed along, you can observe that the above methods work for classification. But, kNN can also be used for regression, it is as simple as finding the average among similar records and predicting the average of the new record!

Now that we understand kNN, there are a few things to keep in mind.

1. Standardization

When measuring, we are not interested in “how much” but, “how different from the average”. For models such as kNN, it is essential to standardize the data prior to applying the model, also known as normalization. Doing so, puts all our variables on similar scales. This is important as it ensures that a variable does not overly influence a model simply due to the scale of its original measurement.

In other words, in Fig 6. you have different colors of data, this is done to visualize the different classes. You are not going to compare apples with oranges. You compare apples with apples. By applying standardization methods, you are somehow making the oranges into apples????

2. Selecting the k value

Choosing a large k value will result in oversimplification. Observe Fig 6. If I take my k value to be 10, and then all the neighbors turn out to be of class B, then automatically my new instance will be classified as class B, and we do not want this!

On the other hand, a small k value overfits the model and there is a lot of sensitivity to noise. If my k = 1, then only one neighbor is compared, and the model classifies an apple as an orange, and as stupid as it sounds, the next time I throw an apple at my model, it says orange.

So what do we do? The most common method is to test a range of values and optimize the performance of the model. This is why when you see examples or other blogs or code online, you have a curve, with a bunch of k values. This is done to find the most optimal k value for the model.

Read my other blogs if you are interested in learning the theory behind ML

Support Vector Machines — What Are They?

Are they really machines? What are these SVMs people keep talking about? SVMs are quite the popular models out there…

medium.com

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

medium.com

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

medium.com

Clustering Analysis: Hierarchical Clustering

So you have data, you have gone ahead and performed your EDA, data cleansing, etc, but you realize your data does not…

medium.com

Linear Regression, an in-depth view.

The connection between data science and statistics is stronger when it comes to prediction. What predictions you may…

medium.com

k-NN — Love thy neighbor

k-Nearest Neighbors

1. Standardization

2. Selecting the k value

Support Vector Machines — What Are They?

Are they really machines? What are these SVMs people keep talking about? SVMs are quite the popular models out there…

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

Clustering Analysis: Hierarchical Clustering

So you have data, you have gone ahead and performed your EDA, data cleansing, etc, but you realize your data does not…

Linear Regression, an in-depth view.

The connection between data science and statistics is stronger when it comes to prediction. What predictions you may…

Written by Bhanu Kiran