Member-only story

Getting Started

How to Get Feature Importances from Any Sklearn Pipeline

Pipelines can be hard to navigate here’s some code that works in general.

Nicolas Bertagnolli
Towards Data Science
8 min readOct 12, 2020

--

Photo by Quinten de Graaf on Unsplash

Introduction

Pipelines are amazing! I use them in basically every data science project I work on. But, easily getting the feature importance is way more difficult than it needs to be. In this tutorial, I’ll walk through how to access individual feature names and their coefficients from a Pipeline. After that, I’ll show a generalized solution for getting feature importance for just about any pipeline.

Pipelines

Let’s start with a super simple pipeline that applies a single featurization step followed by a classifier.

from datasets import list_datasets, load_dataset, list_metrics
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import svm
# Load a dataset and print the first examples in the training set
imdb_data = load_dataset('imdb')
classifier = svm.LinearSVC(C=1.0, class_weight="balanced")
model = Pipeline(
[
("vectorizer", TfidfVectorizer()),
("classifier", classifier),
]
)
x_train = [x["text"]for x in imdb_data["train"]]
y_train = [x["label"]for x in…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Nicolas Bertagnolli
Nicolas Bertagnolli

Responses (2)