K-Nearest Neighbours

Himani Mogra
Oct 22, 2020 · 4 min read

K-nearest neighbour is a Machine learning algorithm which falls under supervised learning which can be used for both classification and regression predictive problems. So in this article, we will be unpacking one of the algorithms from supervised learning.

Let’s understand all this step-wise and at the end, we will be able to connect all the dots. Before understanding K-NN first, let’s get an idea about what is supervised learning.

Supervised Learning is where you have input variables (X) as well as output variables(Y) {i.e. we have labelled dataset} and you use an algorithm to learn the mapping function from input to output. F(X) = Y

Supervised learning is further divided into two subcategories:-

Image for post
Image for post

1. Classification: — In these types of problems we have the discrete value as an output to our problem.
E.g. — Assuming given a dataset of positive and negative reviews a new review arises and we have to predict whether it’s a positive review (1) or a negative review (0) .In this problem our output will be either 1 or 0 ( this is 2 class classification) this kind of problems falls under classification.

2. Regression: — In these types of problems we have the real value as an output to our problem.
E.g. — Assuming given a dataset of heights (X) and weights (Y) of students now if a new student comes with given height so we can predict its weight which can be a real value like 53.4kgs, this kind of problem falls under regression. Here we have 2 kinds of variables dependent and independent. In this example, we can say height is an independent variable and weight is the dependent variable.

Now, back to K-NN again:
K-NN assumes that similar kinds of things exist in close proximity (similar data points are close to each other). K-NN algorithm hinges upon this assumption to be true to make the algorithm works well.

Let’s see with an example, how it actually works.
Assuming we are given a dataset which contains two types of points negative (red one’s) and positive (blue one’s)

Image for post
Image for post

Task: Given a query point (the green one) conclude whether if it belongs to the positive class or negative class.

The main steps to perform the algorithm are-
· Step 1: Calculate Distance.
· Step 2: Get Nearest Neighbours.
· Step 3: Make Predictions on basis of majority.

What K-NN does is, it takes the k nearest neighbours of the query point according to the distance (we will talk about how to measure distance later) and then it takes the majority votes of the neighbours and whichever has the highest majority our new query point belongs to that class only.

Considering k=3 (3 nearest neighbours) for the above example and let say x1,x2 and x3 are the 3-NN of our query point and we find out the class labels of our points are:-

x1= -ve, x2 = -ve and x3 = +ve
majority = -ve class so our xq (green point) will belong to –ve review class.

Now let’s suppose, x1 = +ve, x2 = -ve and x3 = +ve
Majority = +ve class so in this case, our xq will belong to +ve review class.

NOTE:
We can’t take even value of k because we can come across any case where there are an equal number of +ve and –ve class neighbours so in that case, our classifier will not be able to decide to which class it should assign the query point. That’s why to avoid this confusion we always take our k as an odd value.

Image for post
Image for post
Algorithm for KNN

KNN be like I love my neighbours only if they are near to me.

FAILURE CASES OF K-NN

Image for post
Image for post

In the above images, we can see the failure cases of K-NN.
Image a:
Here we can see our dataset is jumbled (query point is a yellow cross) so if we apply K-NN we are not sure will it give a right prediction or not.

Image b: the query point is very far away from the dataset in this case too we are not sure if the prediction will be right or not because it’s an outlier so it can belong to either of the class that’s why we can’t say will our distance-based method work well in this case.

FunFact: K-NN is also known as a lazy learner.

Hope you all understand what is KNN and how it works!!!

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Himani Mogra

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Himani Mogra

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store