AdaBoost Classifier
AdaBoost
AdaBoost, short for Adaptive Boosting, is a boosting technique that is used as an Ensemble Method in Machine Learning. It is called Adaptive Boosting as the weights are re-assigned to each instance, with higher weights to incorrectly classified instances.
Boosting is used to reduce bias as well as the variance for supervised learning. It works on the principle where learners are grown sequentially. Except for the first, each subsequent learner is grown from previously grown learners.
In simple words, weak learners are converted into strong ones.
AdaBoost algorithm also works on the same principle as boosting, but there is a slight difference in working. Let’s discuss the difference in detail.
In the case of AdaBoost, higher points are assigned to the data points which are miss-classified or incorrectly predicted by the previous model. This means each successive model will get a weighted input.
How AdaBoost Works?
Let’s take a look at an example. Here we have a dataset with features from F1 to Fn.
Step 1: Assign sample weights
First step is to assign some initial weights.
Step 2: Create a base learner (Decision Trees)
The base learner here are decision trees with depth equals 1 and 2 leaf nodes. This type of Decision Tree is also called STUMPS.
This step is also called creating sequential base learner. As we create base learner for each features but select the best one based on Gini and Entropy.
We will be creating a Decision Tree for each features.
As we have learned in the CART, while selecting the Decision Tree the tree with the least Gini Index and Entropy will be selected. So, from the Decision Trees the tree with least Gini Index and Entropy will be selected as our first base model.
Step 3: Calculate total error
Calculate the error on the first Base Model. The error is given by -
Let’s say the first initial model predicts 5 correct and 2 incorrect classes out of 7 records (record 4 and 5 are misclassified),
So the total error here will be -
Step 4: Calculate the performance of base learner or model
We calculate performance to update the sample weights or initial weights. Performance can be calculated as,
Based on the error above the performance will be,
Step 5: Update the weights
Update the weights for all records, classified and misclassified both.
Classified Records
New weights for all the classified records is,
Misclassified Records
New weights for all the misclassified records is,
The updated weights will look like,
Step 6: Normalize weights
As we can see the sum of all the updated weights is not equals to 1 as it was originally. So we need to normalize the weight so that the sum is equals to 1.
To do so we will simply divide the updated weights with the sum of updated weights.
Now we can see the sum of the normalized weight is close to 1.
Step 7: Create buckets
The buckets are used to create new dataset.
To create buckets we add next normalized weights with previous.
Step 8: Create new dataset
To create a new dataset a random threshold is chosen on each iteration. The records which falls in the threshold are selected for the new dataset.
Suppose in the 1st iteration the algorithm randomly chooses a threshold value of 0.32. 0.32 falls under the bucket 0.30 to 0.55, and ID = 4. So the model will select this record for the new dataset.
And, in the 2nd iteration, say, it chooses the value 0.60. 0.60 falls under the bucket 0.55 to 0.80, and ID = 5. So the model will select this record for the new dataset.
The model will run all the iteration, in this case it is 7, and create a new dataset. There is a high probability for the same record to be selected several times.
New Dataset
Let’s assume the new dataset looks like below. We can see records 4 is repeated twice.
Step 9: Repeat steps 1 to 8 on the new dataset
We will again select the best feature to build our decision trees using the new dataset. The process will continue until the output starts converging.
Convergence in decision tree is a point where no more leaves can be created.
In AdaBoost since the decision trees already have depth of 1 and 2 leaves the convergence ends up in a cycle. That means it doesn’t know when to stop. To solve this problem AdaBoost uses early stopping or stopped after running certain number of iterations.
Credit: Andrew Ng, StatQuest and Krish Naik
I hope this article provides you with a good understanding of AdaBoost Classifier.
If you have any questions or if you find anything misrepresented please let me know.
Thanks!