fit() vs transform() vs fit_transform() in Python scikit-learn

What’s the difference between fit, transform and fit_transform methods in sklearn

Giorgos Myrianthous
Geek Culture

--

Photo by Kelly Sikkema on Unsplash

scikit-learn (or commonly referred to as sklearn) is probably one of the most powerful and widely used Machine Learning libraries in Python. It comes with a comprehensive set of tools and ready-to-train models — from pre-processing utilities, to model training and model evaluation utilities.

Transformers are among the most fundamental object types in sklearn, which implement three specific methods namely fit(), transform()and fit_transform(). Essentially, they are conventions applied in scikit-learn and its API. In this article, we are going to explore how each of these work and when to use one over the other.

Note that in this article we are going to explore the aforementioned functions using specific examples, but the concepts explained here are applicable to most (if not all) transformers that implement these methods.

Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering

Before explaining the intuition behind fit(), transform()and fit_transform(), it is important to first understand what a transformer is in scikit-learn API.

What are transformers in…

--

--

Giorgos Myrianthous
Geek Culture

I strive to build data-intensive systems that are not only functional, but also scalable, cost effective and maintainable over the long term.