K-Nearest Neighbors Classification Algorithm: A Gentle Introduction

A brief analysis of the K-Nearest Neighbors Classification Algorithm and its working

Dhruv Sharma
2 min readAug 25, 2019

As the President of Stanford, John Hennessy, famously said, “Machine Learning is the new hot thing!”, here is a brief look into another model of Machine Learning, that is being used almost ubiquitously in the market today!

Photo by Zachary Spears on Unsplash

Brief Introduction

Machine Learning is a technique with the help of which machines can predict the future by looking (learning) from the past, which includes various predictive models such as the K-Nearest Neighbors or the KNN model. This article tries to provide an overview of the K-NN model, with which we can understand how it works. To basically define the algorithm, we can take a quick look at how Wikipedia defines it. It says, “the k-nearest neighbors' algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.” So, let’s simplify this terse definition! Basically what the definition means is that this algorithm works by taking new data in, and labeling it to one of the already defined groups.

Working of a K-NN Model Source: Machine Learning A-Z by Kirill Eremenko on Udemy

So how does the algorithm decide which new classification does the data belong to? Well, for that the process as below is followed:

Working of KNN Algorithm

  1. Initialize the dataset, find the new data point and define a number, k, which will be the number of neighbors within a particular distance
  2. Calculate distances of all points in Training example, and predict distances using the test dataset
  3. Sort the distances in increasing fashion
  4. Take the k-nearest neighbors from the number which was previously defined in 1.
  5. Apply the rule of Simple Majority, or do voting so as to so see where the new point will belong to and Classify

Pros of KNN Algorithm:

  1. A simple model to begin classification problems with
  2. It can do well in practice with large samples of data

Cons of KNN Algorithm:

  1. Storage of data becomes an issue
  2. With increasing data, the computation process and cost becomes very high since distances for all training samples need to be evaluated

For the working and usage of the model, please access the same on my GitHub.

--

--