KNN- Machine Learning Algorithm.

Manoj Haygale
3 min readDec 5, 2021

--

figure:1

1. INTRODUCTION

KNN is the one of the simplest supersized machine learning algorithm used for both classification and regression problem. KNN as it is commonly known as the non-parametric method. K is the nearest neighbors we wish to take vote from for prediction of the class for the new observation. there are two functions in KNN we can do classification as well as regression also.

How, there are two function in sklearn library one is called KNNclassifier that is do classification for us and second one is KNNregressor that one is do regression.

2. K-Nearest Neighbor

KNN is non-parametric algorithm & instance based, lazy learning machine learning approach.

1.What is non-parametric ?

A non-parametric algorithm does not make any assumption about the form of the mapping function (function that maps X to Y). They are free to learn any form from the training data.

2.What is instance based learning ?

Our algorithm doesn’t learn explicitly. It memories the training data and uses the same for making prediction on the new data.

3.What is lazy learning ?

No learning of the model is required and all of the work happens at the time of prediction.

3. Understanding Pictorially

figure:2

Let’s take an example, first go with k=3

Here yellow circle and purple circle are two type of observation that may happen in the data.

If i get new observation (denoted by red star), what would be its class ?

Yellow circle?

Purple circle?

We find that the encircled area contains the three nearest neighbors of the red star.(k=3)

Let’s gather votes from these 3 neighbors now

Yellow circle = 1

Purple circle = 2

hence the KNN algorithm predicts the new observation to be a Purple circle.

Now let’s take K=6

We find that the encircled area contains the six nearest neighbors of the red star.(k=6)

Let’s gather votes from these 6 neighbors now

Yellow circle = 4

Purple circle = 2

hence the KNN algorithm predicts the new observation to be a Yellow circle because majority of vote has yellow circle.

Note: In KNN we most of the time take K value as odd value .

3.Notion of distance

KNN needs to know the k nearest neighbors for predicting the class label of new data point.

How does it knows who are the nearest of all data point ?

we need to define distance criterion for the similarity of the observations.

Some common distance metrics.

  1. Euclidean
  2. Manhattan
  3. Hamming

Most commonly used distance metrics is Euclidean distance.

4.How to choose K?

Choosing is K is very tricky in KNN algorithm and choosing appropriate K can be tedious process.

Let’s Understand how K impacts the algorithm and the results.

  1. If K is chosen to be low then the algorithm can be affected by noise in the data and the prediction go wary.
  2. If K is chosen to be high, the the decision boundaries would be smooth but the algorithm would take a lot of time to compute.

Based on the above information we see that a right balance needs to be there in order to achieve the desired result without hampering the performance of algorithm.

Choose K based on validation error.

The purpose of any machine learning algorithm is to have minimum validation error . we can run the algorithm for every K value and see where we are getting least validation misclassification.

Thank You!

--

--

Manoj Haygale

We know technology can bring true revolution in all fields of work.