Meta-Self-Ensemble Learner Package (pip install meta-self-learner) — https://github.com/ajayarunachalam/meta-self-learner
Hello, friends. In this blog post, a meta-learner ensemble design is presented. The meta-ensemble learning model aims to fit any complex data better, lowering the uncertainty in estimation. The two self-learner algorithms aim to find the optimal weights that minimize the objective function.
USP of this package:-
“Meta-Self-Learn” provides several ensemble learners functionality for quick predictive modeling prototyping. Generally, the predictions become unreliable when the input sample is out of the training distribution, bias to data distribution or error prone to noise, and so on. Current approaches by and large require changes to the network architecture, model fine tuning, need of balanced data, increasing the model size, etc etc. Also, mainly the selection of the algorithms plays a vital role, while the scalability and learning ability decrease with the complex datasets. In this package, I have developed an ensemble framework for minimizing generalization error in the learning algorithm irrespective of the data distribution, number of classes, choice of algorithms, the number of models, complexity of the datasets, etc. So, in summary, with this framework one can be able to infer better & generalize well. Another key take-away of the package is the intuitive pipeline that compliments building models in more stable fashion while minimizing the under-fitting/overfitting which is very critical to the overall outcome.
Quickly Setup package:-
Download automation script from here
Run the following command on the terminal
sudo bash setup.sh
pip install meta-self-learner
!pip install meta-self-learner
Meta Self Learner Workflow:-
The designed framework pipeline workflow is as given in the figure.
The first layer comprises several individual classifiers. We have used the base classifiers namely Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), Extra Tree Classifier (ETC), and Gradient Boosting Machines (GBM). The two self-learners (i.e., Ensemble-1 and Ensemble-2) aim to find the optimal coefficients that minimize the objective function, i.e., the log-loss function. With a given set of predictions obtained in the previous layer, the two meta-learners define two different linear problems, and aim to optimize the objective function to find the optimal coefficients that lower the loss.
The pre-processed data is input to Layer-1 of the model. ‘T’ and ‘P’ represent training data and predictions, respectively. In Layer-1, many standalone base learns are used. Input to Layer-2 includes the predictions from previous Layer-1.
Two meta-self-learner ensemble schemes are used. Layer-3 combines the results from the Layer-2 predictions as a simple weighted average (WA). Model evaluation, and result interpretation is done finally in the last stage of the pipeline.
The details of the meta-self learning architecture is as follows:-
Six classifiers are used (LR, SVM, RF, ETC, GBM, and KNN). Here, one can use any machine learning algorithms of their choice. And, build any number of models. All the classifiers are applied twice: 1) The classifiers are trained on (X_train, y_train), and used to predict the class probabilities of (X_valid). 2) The classifiers are trained on (X = (X_train + X_valid), y= (y_train + y_valid)) and used to predict the class probabilities of (X_test).
The predictions from the previous layer on X_valid are concatenated, and used to create a new training set (XV, y_valid). The predictions on X_test are concatenated to create a new test set (XT, y_test). The two proposed ensemble methods, and their calibrated versions are trained on (XV, y_valid), and used to predict the class probabilities of (XT).
The predictions from the previous layer-2 are then linearly combined using a weighted average.
In this way, a hybrid architecture is designed & deployed, where the predictions of the standalone classifier are combined by meta self learner methods, thereby reducing the risk of under-fitting/overfitting.
Let’s get some hands-on with an example.
- Import & load the installed package meta-self-learner
2. Load your dataset
3. Splitting data in training, validation & test set
4. Set class configuration of the meta self ensembler. Here, we have taken four classes from the total digits dataset classes as an quick example.
5. Building the meta-self-learner architecture layer-by-layer in the pipeline
a. Create First Layer
b. Create Second Layer
c. Create third/final layer
6. Performance evaluation & creating graph plots for log loss metrics
7. Plot ROC, Confusion Matrix, and displaying classification report
Complete Demo Notebook:-
Download the file & run the following command on the terminal
You can reach me at firstname.lastname@example.org
Thanks for reading. Hope this article will be useful :)