K-Nearest Neighbors (KNN) Algorithm for Placements Data

Chilakala Bala Mahesh
Analytics Vidhya
Published in
5 min readApr 28, 2021
k Nearest Neighbors(KNN)

Introduction

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. The following two properties would define KNN well −

  • Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
  • Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.

In these article we are creating a KNN classifier model on campus placements data…

Let import all the libraries and read the data.

Now let’s check all the unique values in all columns…

It displays all the unique values……

we can see that all columns has same number of missing values…..

Above charts shows about the count of gender, SSC board, HSC board, HSC Stream of the candidates.

Above charts shows the count of candidates having work experience, the specialization they have chosen, & the status if placed or not placed.

From the above correlation matric its clear that Placement & Salary has strong relationship with three factors

1)degree_ p i.e. Degree % , 2)HSC _ p, i.e. HSC % , 3)SSC _ p I. e SSC % marks.

Above chart shows the relation of degree % and the status of gender if placed or not placed.

Above graphs shows about the distance plot for Degree %, HSC%, SSC%, MBA% scored.

Above shows the histogram of the salary after placement.

Above chart shows you the relationship of different parameters e.g. % of HSC scored vs Degree % with the status whether placed or not place, or the gender, the specialization they chose.

KNN Classifier

Train data split:

Split train data into X_ train, X_ test, y_ train, y_ test.

Model

Now we can build our KNN model.

Predicting the unknown:

Comparing the actual and predicted values:

displays the data frame with actual and predicted values

Finally Calculating the accuracy score, classification report and confusion matrix……….

Advantages:

  1. KNN- is a supervised and non-parametric algorithm.
  2. K-NN is an instance-based learning algorithm.

Disadvantages:

Apart from some advantages, this easy-to-implement algorithm has some cons as well. Some are defined below:

  1. K-NN is a lazy learning algorithm.
  2. It doesn’t perform well for problems with high dimensionality problems. So the curse of dimensionality underlies here as well.
  3. This algorithm is slower to evaluate and needs to store the whole training data. Therefore, it might be computationally expensive as well.

--

--