Pipeline
1 min readSep 16, 2017
Python scikit-learn provides a Pipeline utility to help automate machine learning workflows.
Very useful for cleanly streamlining…
- Data Transformation
- Model Selection (can we pass an algorithm as a parameter?)
- Hyperparameter Tuning
- Save Pipeline
Example #1
- Standardize Data
- Learn a Linear Discriminant Analysis model
#keynotes: cross_val_score on pipeline is very efficient, but convert your metric as a score with make_scorer().
Example #2
- Feature Extraction with Principal Component Analysis (3 features)
- Feature Extraction with Statistical Selection (6 features)
- Feature Union
- Learn a Logistic Regression Model
#keynotes: you can use pipeline to transform data with pipeline.fit and pipeline.transform.
Example #3
- Select Features
- Imputation by 0
- Imputation by mean
- Feature Union
Example #4
- Repeat Example #2
- Grid Search on “n_components” and “k”
- Use “best_estimator” for prediction
Example #5
- Save pipeline/GridSearchCV
- Load pipeline/GridSearchCV