Data Science (Python) :: K-NN (K - Nearest Neighbors)

Intention of this post is to give a quick refresher (thus, it’s assumed that you are already familiar with the stuff) of concept of “K-NN” (using Python). You can treat this as FAQ’s as well.

What kind of problems does K-NN model generally fit to?
Classification problems. For e.g, predict if person will buy a product or not (dependent variable) considering his/her age and salary (independent variables)

*******************************************

Explain the basic working principle of K-NN?
Let’s say we are predicting whether a person will buy a car or not based on his age (X-Axis) & Salary (Y-Axis). First, we need to select the number of neighbors we can to consider (this is denoted by K). For e.g, let’s say we take K as 5. Then, as per the K value, we take the nearest 5 neighbors (as per the training set) of the point for which we need prediction. We take these 5 readings and count how many fall into each category. In this case, it may be that 3 fall into TRUE category & 2 into False (or it may 4-T, 1-F or any other combination). As TRUE category count is more, thus, we predict TRUE for the point.

*******************************************

Sample code for fitting K-NN to the set?
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = ‘minkowski’, p = 2) # X_train is training set for independent variables and y_train is the training set for dependent variables
# Predicting the Test set results
y_pred = classifier.predict(X_test)

******************************************

What do you think will a plot of K-NN Look Like?
Plot of K-NN is not linear. The plot looks more or like boundary of a country! This is because, K-NN model is a non-linear regression.