K Nearest Neighbors

Navjot Singh
Analytics Vidhya
Published in
3 min readJun 11, 2020

KNN falls in the supervised learning family of algorithms. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure. Classification is done by a majority vote to its neighbors. The data is assigned to the class which has the most nearest neighbors. As you increase the number of nearest neighbors, the value of k, accuracy might increase.

KNN is a non-parametric, lazy learning algorithm. When we say a technique is non-parametric , it means that it does not make any assumptions on the underlying data distribution.

In KNN algorithm, there is no explicit training phase or it is very minimal

Let’s see How does the K Nearest Neighbor Algorithm works:

Step 1: Choose the number K of neighbors that we are going to have in the algorithm.Here ‘K’ in KNN is the number of nearest neighbours used to classify or (predict in case of continuous variable/regression) a test sample. One of the most common default value of K is 5.

Step 2: Take the K nearest neighbor of the new data point,according to the Euclidean distance. We can also use any other formula of measuring the distance like Manhattan distance or Chebyshev and Hamming distance.

Euclidean distance:Euclidean distance or Euclidean metric is the “ordinary” straight-line distance between two points in Euclidean space.

Euclidean Distance Formula

Step 3: Amoung these K Neighbors , count the number of data points in each category that is how many data points fell into one category to the other category.

When the value of K is small then we are forcing our model to classify blindly and predict the outcome. A small value for K provides the most flexible fit. On the other hand larger value of K will increase the average voters in each prediction and hence is more resilient to outliers.

Step 4: Assign the new data point to the category where we counted the most neighbor.

In KNN, the model structure is determined from the data. KNN is also a lazy algorithm, What this means is that it does not use the training data points to do any generalization.

Application

Very simple implementation as there is no explicit training phase or it is very minimal. KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics.

One of the most common and widely used case of K-NN algorithm is Recommender Systems.

That’s all for K Nearest Neighbor Machine learning Algorithm. Stay tuned for further blogs.

Thankyou

Implementation of KNN algorithm on Breast Cancer dataset ~

dataset : Breast Cancer dataset

link: https://github.com/InternityFoundation/MachineLearning_Navu4/blob/master/KNN/KNN.ipyn

--

--

Navjot Singh
Analytics Vidhya

Machine learning enthusiast interested in making data actionable.