In the previous techniques like Bagging and Boosting, we have used the base learners as Decision Trees. You might get a doubt, is there a way to use different base models (like linear SVM, Naive Bayes, KNN etc, with good bias and variance) instead of just Decision Trees? Yes, that is possible with Stacking technique. Let’s dive deep to understand better.
Understanding Stacking:
Step 1:
The core concept of this technique is, first we use the whole training data and train ‘m’ base models. Here, all the models are different and independent from each other, just like the models in Random forest and all are running parallelly. (More different the models are, the better the performance is.)
Now, given query point is passed through each of the ‘m’ models and we get the predicted outputs.
Step 2:
Now, instead of aggregating these outputs(just like we did it in Random Forest), a new data set is formed with these outputs and passed to a Meta classifier(which could be Logistic Regression, Decision tree, Linear SVM etc,.) which actually predicts the output.
Training a Meta classifier on the outputs of Base learners.
Usage:
Stacking is seldom used in real-world, as we have to train, run multiple base learners which takes more time.
Stacking gives us the highest accuracy and is mostly used in Kaggle competitions.
Code: sklearn.ensemble
.StackingClassifier
This is all about one of the simple concepts of Ensemble model, please go through the below link for code reference on how to implement Stacking with mlxtend. Thank you :)
References:
http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html