# ML algorithms 1.03: K Nearest Neighbors

# Introduction

K Nearest Neighbors is a lazy learner. It does not build a model as such. Scaling of features is necessary before calculating distances when the features have different ranges.

# Advantages

- No training required.
- Very easy to interpret.

# Disadvantages

- Need to find the optimum K(no. of neighbors). This might change as new data is added.
- Problems arise when data is very large or the no. of dimensions is large. This can be solved by using kd-tree and ball tree algorithms.
- It is sensitive to noisy data.

# Distance metrics

# Model building

Important points to note:

- We need to normalize/standardize the numerical features and encode the categorical features to dummies.
- If the predictor data is a mix of numeric and categorical features, we can only do MinMax scaling of the numerical data. This is to keep the values bounded within [0,1] in the same range as categorical encoded variables.
- If the predictors are all numeric, we can use Euclidean, Manhattan, Minowski or Mahalanobis distance.
- If the predictors are only categorical, we can use hamming distance.
- If the predictors are a mix of of numeric and categorical, we have to choose the distance metric very carefully.
- When the data has very high dimensions, we suffer from the curse of dimensionality. In this case we restrict to using lower powers in the minkowski distance. We can also resort to dimension reduction techniques and feature selection.
- The best value of K can be found out by plotting the error and the K.

# Prediction

The new observation is placed in the feature space of the training data.

## Classification

The output class is the modal class for n no. of neighbors.

## Regression

The output estimate is the mean value of n no. of neighbors.

You should keep the no. of neighbors as odd when the no. of classes is even and vice versa to avoid confusion.

# Hyperparameter tuning

**n_neighbors**: To set the no. of neighbors**weights**: uniform; distance:

uniform: uniform weight of all points

distance: nearer points have more weight inversely proportional to their distance**algorithm**: the type of algorithm construction.

kd-tree: K dimensional tree will be formed (axis parallel hyperplanes)

ball-tree: the space will be divided into concentric spheres

brute-force: all the distances will be calculated**leaf_size:**the no. of points in the kd-tree or ball tree leaf**p:**Power parameter for the Minkowski metric**metric:**distance metric for the algorithm**metric_params:**additional metric parameter; like covariance matrix for Mahalanobis distance

*******************************************************************Example code:*

from sklearn.neighbors import KNeighborsRegressor as K{or KNeighborsClassifier}

knn =K(n_neighbors=5)

knn.fit(X_train, y_train)

y_hat = knn.predict(X_test)

******************************************************************