The new ColumnTransformer will change workflows from Pandas to Scikit-Learn

From Pandas to Scikit-Learn — A new exciting workflow

Published in

Dunder Data

21 min readSep 3, 2018

Scikit-Learn’s new integration with Pandas

Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.

Learn More

Master Data Analysis with Python is an extremely comprehensive text with over 80 chapters, 500 exercises, and video lessons to help you become an expert.

Summary and goals of this article

This article is aimed at those that use Scikit-Learn as their machine learning library but depend on Pandas as their data exploratory and preparation tool.
It assumes you have some familiarity with both Scikit-Learn and Pandas
We explore the new ColumnTransformer estimator, which allows us to apply separate transformations to different subsets of your data in parallel before concatenating the results together.
A major pain point for users (and in my opinion the worst part of Scikit-Learn) was preparing a pandas DataFrame with string…

From Pandas to Scikit-Learn — A new exciting workflow

Scikit-Learn’s new integration with Pandas

Learn More

Summary and goals of this article

Written by Ted Petrou