Localized Regression (KNN with Local Regression)

İbrahim Halil Kaplan
Machine Learning Turkiye
3 min readJan 6, 2022

Can we blend the KNN regression algorithm with other algorithms? Can we achieve more efficient results this way?

First of all, let me briefly explain how KNN regression and linear regression work.

The K nearest neighbor is a simple supervised machine learning algorithm that memorizes the available training data and estimates the numerical target based on a similarity measure (e.g. distance functions). It arrives at a numerical value by averaging the nearest neighbors. KNN regression uses the same distance functions as KNN classification. Although Euclidean distance is usually used, a wide variety of distance measurement metrics can be used. You can see them in the image below. You can also find more detailed information about the KNN classification algorithm and distance measurements here.

Distance Measurements

Linear Regression is a machine learning algorithm based on supervised learning and models a target predictive value based on independent variables. It is mostly used to find the relationship between variables and prediction. In other words, they make an estimation of a new data point, whichever target values ​​they actually got according to the characteristics of the given training dataset.

After briefly mentioning algorithms, let’s come to what we want to do. What can we do if the KNN algorithm saves the dataset and determines the commands of the data point we want to predict, and if we want to follow a different path instead of averaging these neighbors?

Let’s try it.

We created a dataset with 100 samples in the image for KNN regression. Then we separated it as training and test data and fit it with KNN regressor.

Then we started linear model here, located the neighbors and x, y data were created from close neighbors.

nearestneighbor=model.kneighbors(np.array(X_test[0]).reshape(-1,2))[1] # We located the neighbors.neighbors_x = [] # x and y data were created from close neighbors here.
neighbors_y = []
for i in nearestneighbor:
neighbors_x.append(X_train[i])
neighbors_y.append(y_train[i])

Neighbors_x and neighbors_y are converted to np.array to be able to reshape. Then linear model fitted.

neighbors_x = np.array(neighbors_x) # The list has been converted to np.array to be able to reshape.neighbors_y = np.array(neighbors_y)linermodel = LinearRegression()linermodel.fit(neighbors_x.reshape(-1,2),neighbors_y.reshape(-1,1))

The prediction is impressive.

linermodel.predict(np.array(X_test[0]).reshape(-1,2))# linear model predicted based on point X_test[0]
array([[18.16595607]])
y_test[0] # real data point.
18.16595607153592

What would be the result if we made a prediction on the knn regressor model?

y_predknn = model.predict(np.array(X_test[0]).reshape(-1,2))y_predknn # KNN models prediction
array([13.12238492])

When we examine the results, we can say that the method we applied is useful.

If we make this method we follow into a simple class, we can do it as follows.

class LocalizedRegressor(KNeighborsRegressor):
def predict(self,X):
y_preds = []
dist,inds=self.kneighbors(X)
features = self._fit_X[inds]
labels = self._y[inds]
for i in range(X.shape[0]):
X_lin = features[i]
y_lin = labels[i]
model_for_this_point = LinearRegression()
model_for_this_point.fit(X_lin,y_lin)
prediction = model_for_this_point.predict(X[i:i+1])
y_preds.append(prediction[0])
y_preds = np.array(y_preds)
return y_preds
m = LocalizedRegressor(n_neighbors=25)m.fit(X_train,y_train)
LocalizedRegressor(n_neighbors=25)
m.predict(X_test[0:1])
array([18.16595607])

Let’s see how Localized regression performs by mixing the dataset a bit.

X, _ = make_regression(n_samples=100,n_features=2,n_informative=2)
y = 2. * X[:, 0] ** 2–0.7 * X[:, 1] ** 3–0.54
Knn_model.score(X_test,y_test)
0.6495531202875833
LocalReg.score(X_test,y_test)
0.9428817200363483

Looks like it’s working :)

You can find the whole code blog here.

Thank you for reading , I hope it was useful.

--

--