# K-Neighbors Regression Analysis in Python

Published in
3 min readApr 20, 2019

K nearest neighbors is a simple algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique. Algorithm A simple implementation of KNN regression is to calculate the average of the numerical target of the K nearest neighbors. Another approach uses an inverse distance weighted average of the K nearest neighbors. KNN regression uses the same distance functions as KNN classification.

The above three distance measures are only valid for continuous variables. In the case of categorical variables you must use the Hamming distance, which is a measure of the number of instances in which corresponding symbols are different in two strings of equal length.

The prediction using a single neighbor is just the target value of the nearest neighbor.

Let’s go to hands on, in this article I use the dataset from mglearn, the first step if you don’t have package in your note book, install in cmd/anaconda prompt..

`pip install mglearn`

After that, you can plot k-neighbors regression with n_neighbors = 1.

`import mglearn import matplotlib.pyplot as pltmglearn.plots.plot_knn_regression(n_neighbors=1)`

Again, this k-neighbors regression just use 1 n_neighbors, you can use more than the single closest neighbor for regression, and the prediction is the average or mean of relevant neighbors. Let us see…

`mglearn.plots.plot_knn_regression(n_neighbors=3)`

Now we can make predict on the test data use knn regresson with n_neightbors = 3

`from sklearn.neighbors import KNeighborsRegressorX, y = mglearn.datasets.make_wave(n_samples=40)# split the wave dataset into a training and a test setX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)# instantiate the model and set the number of neighbors to consider to 3reg = KNeighborsRegressor(n_neighbors=3)# fit the model using the training data and training targetsreg.fit(X_train, y_train)`

If you have done the above, you can use your model on test data

`print(reg.score(reg.score(X_test, y_test)))`

out : 0.83

ANALYZING KNEIGHBORS REGRESSOR

We can analyse how accuracy gets affected by n_neighbors: We can use different value 3 n_neighbors, and explain where good value n_neighbors for model.

`fig, axes = plt.subplots(1, 3, figsize=(15, 4))# create 1,000 data points, evenly spaced between -3 and 3line = np.linspace(-3, 3, 1000).reshape(-1, 1)for n_neighbors, ax in zip([1, 3, 9], axes):    # make predictions using 1, 3, or 9 neighbors    reg = KNeighborsRegressor(n_neighbors=n_neighbors)    reg.fit(X_train, y_train)    ax.plot(line, reg.predict(line))    ax.plot(X_train, y_train, '^', c=mglearn.cm2(0),                markersize=8)    ax.plot(X_test, y_test, 'v', c=mglearn.cm2(1), markersize=8)    ax.set_title("{} neighbor(s)\n train score: {:.2f} test                score: {:.2f}".format(n_neighbors,                  reg.score(X_train, y_train),reg.score(X_test,               y_test)))    ax.set_xlabel("Feature")    ax.set_ylabel("Target")axes[0].legend(["Model predictions", "Training data/target","Test       data/target"], loc="best")`

As we can see from the plot, using only a single neighbor, each point in the training set has an obvious influence on the predictions, and the predicted values go through all of the data points. This leads to a very unsteady prediction. Considering more neighbors leads to smoother predictions, but these do not fit the training data as well.

ref : Andreas C.Muller and Sarah Guido. 2017. Introduction to machine learning with pyhton

--

--

Data Scientist at KECILIN.ID || Physicist ||Writer about Data Analysis, Big Data, Machine Learning, and AI. Linkedln: https://www.linkedin.com/in/imammuhajir92/