fit() vs transform() vs fit_transform() in Python scikit-learn
What’s the difference between fit, transform and fit_transform methods in sklearn
scikit-learn (or commonly referred to as sklearn) is probably one of the most powerful and widely used Machine Learning libraries in Python. It comes with a comprehensive set of tools and ready-to-train models — from pre-processing utilities, to model training and model evaluation utilities.
Transformers are among the most fundamental object types in sklearn, which implement three specific methods namely fit()
, transform()
and fit_transform()
. Essentially, they are conventions applied in scikit-learn and its API. In this article, we are going to explore how each of these work and when to use one over the other.
Note that in this article we are going to explore the aforementioned functions using specific examples, but the concepts explained here are applicable to most (if not all) transformers that implement these methods.
Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering
Before explaining the intuition behind fit()
, transform()
and fit_transform()
, it is important to first understand what a transformer is in scikit-learn API.