# Feature Engineering Experiment- Weighted KNN

Using KNN to learn feature weights

One thing that I have learned the hard way about machine learning problems is that most of them require that a large proportion of man hours be dedicated not to data wrangling but to EDA and feature exploration.

This crucial step is often underplayed in most courses and discourses, primarily because it’s as much an art as science. And when I say art, it’s primarily because the possibilities on how to approach feature engineering, like hyper parameter tuning, are limited only by the computational resources and time.

These methods vary from Wrapper approaches for subset selection to feature extraction and feature construction. Where each of these range from leveraging KNNs to Random Forests to Variance Thresholds etc.

This article discusses using KNNs to learn weights for available features and then selecting the features with largest weights.

Intuitively how the approach works is

1. For all the observations in the dataset it calculates similarity between each observation and all other observations. How it differs from plain vanilla KNN is that the similarity is weighted.

In Vanilla KNN with Euclidean distance we would have — distance between two observations p,q given by d(p,q)

But in the weighted form we will have something like

Where for each i-th feature there is a corresponding weight w-i

2. This similarity/distance measure is then used to select the k-nearest neighbors of a particular observation

3. Now as in KNN we use the neighbors to predict the label of the of the observation.

4. Based on the predictions we can calculate the performance metrics like accuracy or AUC or Log Loss.

This in turn will act as the loss function for our overall Feature Learning Algorithm.

ie- We will keep changing the weights using an optimization algorithm like Gradient Descent till we are able to minimize the loss function.

Let’s run some experiments in Python- Code is here #github

Import the required Libraries

Then using sklearn’s make_classification we create a dataset of 300 observations with only 5 are informative features

Then using np.random.normal we are adding 10 features which are essentially just noise

Then we initialize weights to all zeros and set k for knn to 5

Then we wrap the KNN algorithm in a function. Lets go over the overall approach.

1. we are iterating over all pairs of observations

2. and calculating the weighted absolute distance between them.

3. Then we pick the k- nearest neighbors and use them to predict,for each observation, the correct label.

4. Then we use the predicted probabilities to calculate the accuracy score

Below is the complete code for knnWeights function

Then we use scipy’s minimize to learn the weights

To get a sense of how good the learned weights are we can run KNN using all features and compare it with KNN run using only the 5 features with highest weights (remember while creating the dataset we had set n_informative to 5). We also randomly select a subset of features and compare results for that as well.

• Results using KNN classifier with all features
• results Using KNN classifier with features with highest weights
• results using random subset

We can combine these metrics — accuracy and ROC per experiment into a dataframe and run the experiment 10 times to get an overall sense of how

From 10 experiments the results were

where rfa = Random Feature subset Accuracy

rfauc = Random Feature subset AUC

woa = No feature selection Accuracy

woauc = No feature selection AUC

wa = Feature selection Accuracy

wauc = Feature selection Accuracy

To get a better sense we can aggregate the results

What we see is that Random Feature subset selection has worse mean performance and Features selected using KNN have the highest mean performance. No feature selection yields performance in between the two approaches.

--

--