Scikit-Learn Design with Easy Explanation

Burcu Koçulu
Analytics Vidhya
Published in
3 min readJan 17, 2021

--

I think we all can agree that Scikit-learn is the main machine learning library for Python. It’s actually beginner friendly library because of its semantic interface (API). Which is the reason why many people still use and contribute Scikit-Learn library.

github.com

Scikit-Learn library is very well designed with Python’s OOP(Object-Oriented Programming) method. Check out the classes from here.

There isn’t a class like this by the way, but a supervised learning algorithm may have a structure like this:

sorry for my bad hand writing :(

So what is an estimator? What about those fit(), predict() and score() methods? Let’s check out.

Scikit learn library has three main interfaces(API):

Estimators: Main and core interface of Scikit-learn. Shortly, we say that estimators are the classes which can learn and estimate some parameters of the data with the fit() method. All of the estimator’s hyperparameters are accessible directly via public instance variables. For example, Random forest algorithm is an estimator.

estimator = estimator.fit(data, targets) #supervised learningorestimator = estimator.fit(data) #unsupervised learning

Transformers : Estimators which can also transform data with transform() or fit_transform() methods are called Transformers.

new_data = transformer.transform(data)new_data = transformer.fit_transform(data) #fit_transform method is much faster than doing fit() and transform() in order. 

Predictors: Some estimators can also predict a value. For example we can predict quantities with the finalized regression model by calling the predict() and score() function on the finalized model.

prediction = predictor.predict(data)
probability = predictor.predict_proba(data)
score = model.score(data)

SimpleImputer class from sklearn library is commonly used for handling missing values. It’s an estimator and also a transformer example.

Here’s a supervised learning scheme made by me. Notice that fit() method took two parameters here.

What is the difference between an Estimator and a Predictor ?

Even though a predictor is also an estimator, there’s a slight difference between them. Estimators learn the train data and estimate of some parameters with fit() or fit_transform() method while Predictors make prediction of the unseen data(which is test data) to predict a value, by using predict() method.

Thanks for reading. Please let me know if you have any feedback.

References:

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

--

--