How to use sklearn’s feature importance attribute to select less importance features

Tracyrenee
Python’s Gurus
Published in
5 min readJun 15, 2024

--

In my most recent post I reviewed a research paper that explained why tree based models tend to outperform deep learning models when making predictions on tabular datasets. That particular blog post can be found here:- https://tracyrenee61.medium.com/research-paper-review-why-do-tree-based-models-still-outperform-deep-learning-on-tabular-data-3bb9e9ff0846

In one part of the research paper the authors discussed how using the feature_importances_ attribute in tree based models will have an effect on the accuracy of a prediction, so I decided to check this out and see what the result of selecting features would have on a prediction.

I decided to use Kaggle’s playground competition 4.6 because I have previously submitted the predictions for this competition without feature selection techniques. The only difference in this newer piece of code is the fact that I used the feature_importance_ attribute to select features.

I created a new Jupyter Notebook and stored it in my Kaggle account, then importing the libraries that I would need to execute the program, being:-

  1. Numpy to create numpy arrays and perform numerical computations,
  2. Pandas to create dataframes and process data,

--

--

Tracyrenee
Python’s Gurus

I have five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.