# K-Nearest Neighbours

K-nearest neighbour is a Machine learning algorithm which falls under supervised learning which can be used for both classification and regression predictive problems. So in this article, we will be unpacking one of the algorithms from supervised learning.

Let’s understand all this step-wise and at the end, we will be able to connect all the dots. Before understanding K-NN first, let’s get an idea about what is supervised learning.

**Supervised Learning **is where you have input variables (X) as well as output variables(Y) {i.e. we have labelled dataset} and you use an algorithm to learn the mapping function from input to output. **F(X) = Y**

**Supervised learning is further divided into two subcategories:-**

**1. Classification: **— In these types of problems we have the discrete value as an output to our problem.**E.g. —** Assuming given a dataset of positive and negative reviews a new review arises and we have to predict whether it’s a positive review (1) or a negative review (0) .In this problem our output will be either 1 or 0 ( this is 2 class classification) this kind of problems falls under classification.

**2. Regression: —** In these types of problems we have the real value as an output to our problem.**E.g. — **Assuming given a dataset of heights (X) and weights (Y) of students now if a new student comes with given height so we can predict its weight which can be a real value like 53.4kgs, this kind of problem falls under regression. Here we have 2 kinds of variables dependent and independent. In this example, we can say height is an independent variable and weight is the dependent variable.

**Now, back to K-NN again:**K-NN assumes that similar kinds of things exist in close proximity (similar data points are close to each other). K-NN algorithm hinges upon this assumption to be true to make the algorithm works well.

Let’s see with an example, how it actually works.

Assuming we are given a dataset which contains two types of points negative (red one’s) and positive (blue one’s)

**Task: Given a query point (the green one) conclude whether if it belongs to the positive class or negative class.**

**The main steps to perform the algorithm are-**·

**Step 1**: Calculate Distance.

·

**Step 2**: Get Nearest Neighbours.

·

**Step 3**: Make Predictions on basis of majority.

What K-NN does is, it takes the k nearest neighbours of the query point according to the distance (we will talk about how to measure distance later) and then it takes the majority votes of the neighbours and whichever has the highest majority our new query point belongs to that class only.

Considering k=3 (3 nearest neighbours) for the above example and let say x1,x2 and x3 are the 3-NN of our query point and we find out the class labels of our points are:-

**x1= -ve, x2 = -ve and x3 = +ve**

majority = -ve class so our xq (green point) will belong to –ve review class.

Now let’s suppose, **x1 = +ve, x2 = -ve and x3 = +ve**

Majority = +ve class so in this case, our xq will belong to +ve review class.

**NOTE:We can’t take even value of k because we can come across any case where there are an equal number of +ve and –ve class neighbours so in that case, our classifier will not be able to decide to which class it should assign the query point. That’s why to avoid this confusion we always take our k as an odd value.**

KNN be like I love my neighbours only if they are near to me.

**FAILURE CASES OF K-NN**

**In the above images, we can see the failure cases of K-NN.Image a: **Here we can see our dataset is jumbled (query point is a yellow cross) so if we apply K-NN we are not sure will it give a right prediction or not.

**Image b:** the query point is very far away from the dataset in this case too we are not sure if the prediction will be right or not because it’s an outlier so it can belong to either of the class that’s why we can’t say will our distance-based method work well in this case.

FunFact:K-NN is also known as a lazy learner.

Hope you all understand what is KNN and how it works!!!