Finding the most impactful features in a dataset using Mutual Information criteria

Gaurang Mehra
2 min readNov 7, 2022

--

Feature selection using Mutual Info regression

Objective- Identify important features in a dataset that act as predictors for the target variable, in this case math scores.

Dataset-The dataset contains math scores and host of possible predictor variables like family size, internet access at home, father’s job etc. There are a total of 31 variables.

Approach- Here we will use a slightly different approach. Instead of doing EDA for all features, we will use mutual information regression to understand the strength of the relationship between test scores and features.

Mutual Information Metric- Mutual information is a lot like correlation, it measures the strength of the relationship between 2 variables. The key difference is that where correlation only measures the strength of the linear relationship between 2 variables, mutual info measures the strength of any relationship between the 2 variables (including a strong non-linear relationship that correlation would miss). It is a +ve number with higher values indicating a stronger relationship. Values higher than 2 are very rare. If the target variable is continuous then you use mutual information regression and if it is discrete then we use mutual info classification.

Step1- Load the data and feature selection

We import the necessary libraries and examine the dataset.

We do the following in the steps below-:

  • Load the data and examine the dataset.
  • Encode all object (string columns) to integers

Step 2-: Run the mutual information regression

Now that our dataset is set up we run the mutual info regression.

  1. We split the data set into 2 arrays , X is the feature array and y is the math test score array represented by the column G3
  2. We find the discrete columns (type int in the feature array X). The mutual_info_regression() function needs this as an input
  3. Run mutual info regression and visualize results
Fig 1.1 Top 10 feature importances
  • The mutual info score rapidly decays after the 1st most important feature. In this case any past failures in a math test.
  • It looks like that most of the features beyond the top 10 features have very low Mutual information scores.
  • We have already gone from a dataset with 31 possible features and distilled it down to 10 most important features.

--

--

Gaurang Mehra

Deeply interested in Data Science, AI and using these tools to solve business problems.