The new ColumnTransformer will change workflows from Pandas to Scikit-Learn

From Pandas to Scikit-Learn — A new exciting workflow

Ted Petrou
Dunder Data

--

Scikit-Learn’s new integration with Pandas

Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.

Learn More

Master Data Analysis with Python is an extremely comprehensive text with over 80 chapters, 500 exercises, and video lessons to help you become an expert.

Summary and goals of this article

  • This article is aimed at those that use Scikit-Learn as their machine learning library but depend on Pandas as their data exploratory and preparation tool.
  • It assumes you have some familiarity with both Scikit-Learn and Pandas
  • We explore the new ColumnTransformer estimator, which allows us to apply separate transformations to different subsets of your data in parallel before concatenating the results together.
  • A major pain point for users (and in my opinion the worst part of Scikit-Learn) was preparing a pandas DataFrame with string…

--

--

Ted Petrou
Dunder Data

Author of Master Data Analysis with Python and Founder of Dunder Data