Pipeline

Eugine Kang
1 min readSep 16, 2017

--

Python scikit-learn provides a Pipeline utility to help automate machine learning workflows.

Very useful for cleanly streamlining…

  1. Data Transformation
  2. Model Selection (can we pass an algorithm as a parameter?)
  3. Hyperparameter Tuning
  4. Save Pipeline

Example #1

  1. Standardize Data
  2. Learn a Linear Discriminant Analysis model

#keynotes: cross_val_score on pipeline is very efficient, but convert your metric as a score with make_scorer().

Example #2

  1. Feature Extraction with Principal Component Analysis (3 features)
  2. Feature Extraction with Statistical Selection (6 features)
  3. Feature Union
  4. Learn a Logistic Regression Model

#keynotes: you can use pipeline to transform data with pipeline.fit and pipeline.transform.

Example #3

  1. Select Features
  2. Imputation by 0
  3. Imputation by mean
  4. Feature Union

Example #4

  1. Repeat Example #2
  2. Grid Search on “n_components” and “k”
  3. Use “best_estimator” for prediction

Example #5

  1. Save pipeline/GridSearchCV
  2. Load pipeline/GridSearchCV

--

--