k-Neighbors Classifier with GridSearchCV Basics

Salem O.
6 min readOct 21, 2018

This post is designed to provide a basic understanding of the k-Neighbors classifier and applying it using python. It is by no means intended to be exhaustive.

k-Nearest Neighbors (kNN) is an algorithm by which an unclassified data point is classified based on it’s distance from known points. While it’s most often used as a classifier, it can be used to solve regression problems as well. The following example will be used in this post:

Modified example, original image taken from A Data Analyst

In the rudimentary example above, our model is trying to predict whether a vehicle is an SUV or a sedan based on just 2 factors. Fuel Efficiency (labeled as MPG) is on the x-axis and weight of the vehicle is on the y-axis. The red star represents an unknown data point, while the purple points are known sedans, and the yellow ones are known SUVs.

The way that a k-Nearest Neighbors model decides how to classify the unknown point is by creating a circle with the point as the center. The size of the circle is set by choosing the hyperparameter, k. This setting does not refer to the actual size of the circle, however, it refers to how many neighboring points are to fall inside of the circle.

With no other hyperparameters set, the number of neighbors from each class are counted, and the classification is made by a “vote”, meaning that the class that is seen…

--

--