Building an AdaBoost Model From Scratch with Python

6 min readDec 6, 2023

Hi! In this third article of my implementing Decision Trees From Scratch Series, we’ll implement a very powerful approach called AdaBoost.

The Decision Tree Medium Article series:

Building a Decision Tree From Scratch with Python: Check it out since we will build the AdaBoost model on top of the decision tree implemented in it.
Building a Random Forest From Scratch with Python
Building an AdaBoost Model From Scratch with Python (This article)

Brief History

AdaBoost algorithm was proposed by Y. Freund and R. Schapire in 1997. In that paper by Freund and Schapire, the algorithm was suitable for binary classification. Later in 2006, a new paper by Zhu et al. proposed some additions and changes to the original AdaBoost algorithm for multiclass classification tasks and predicting probabilities. Those proposed algorithms are called SAMME and SAMME.R which are also used by sklearn library for sklearn.ensemble.AdaBoostClassifier model. (see the resources for the article links).

What is AdaBoost?

AdaBoost is an acronym for “Adaptive Boosting”. The main idea of the algorithm is to build weak learner models on top of each other so that the weak sides of the earlier learners will be tolerated by the later learners. The algorihm is “adaptive” because after training a weak learner, the model gives more weights to the samples that are misclassified. And it is also a “boosting” model because it uses more than one model to make 1 prediction.

Building the AdaBoost Classifier from Scratch

In this part, we will walk through the Python implementation of AdaBoost by explaining the steps of the algorithm. You can see the full code in my github account here.

As always, we will have a “train()” function to train the model with a dataset, also “predict()” and “predict_proba()” functions to make a prediction. You can see in Image 1 that we do not need a lot of functions because we build upon our DecisionTreeClassifier which we implemented from scratch in the first article of this series.

SAMME and Boosting

Before diving into our code, let’s first understand the algorithm for AdaBoost. You can see the SAMME algorithm in Image 2 (Check the resources for the paper).

Image 2 — SAMME Algorithm by Hastie, Trevor, et al.

The summary of the algorithm that we will be implementing in this article:

We train base learner (in our case DecisionTree with max_depth=1)
We calculate amount of say for our base learner by taking it’s error into account.
We calculate sample weights to give more weight to misclassified samples.
(Not in SAMME algorithm) We will create bootstrap samples by taking sample weights into account
We go step one with our bootstrap samples.

Let me explain the step 4 a little bit since it is worth to mention in our implementation. There are 2 ways take the sample weights into account.

“boosting by reweighting”: You need base learners which are able to take the sample weights into account.
“boosting by resampling”: You can use any base learner. Just create bootstrap samples by taking the sample weights into account.

Now we are ready to dive into the code.

Hyperparameters

See that our only hyperparameter is number of base learners.

    def __init__(self, n_base_learner=10) -> None:
        """
        Initialize the object with the hyperparameters
        n_base_learner: # of base learners in the model (base learners are DecisionTree with max_depth=1)
        """
        self.n_base_learner = n_base_learner

Train Function

In the train function you can see that we implement the SAMME algorithm. Our first boostrapped sample are our original train data. Then we update the boostrapped sample in each base learner training iteration.

    def train(self, X_train: np.array, y_train: np.array) -> None:
        """
        trains base learners with given feature and label dataset 
        """
        self.X_train = X_train
        self.y_train = y_train
        X_bootstrapped = X_train
        y_bootstrapped = y_train
        self.label_count = len(np.unique(y_train))

        self.base_learner_list = []   
        for i in range(self.n_base_learner):
            base_learner = self._fit_base_learner(X_bootstrapped, y_bootstrapped)
            self.base_learner_list.append(base_learner)
            sample_weights = self._calculate_sample_weights(base_learner)
            X_bootstrapped, y_bootstrapped = self._update_dataset(sample_weights)

Fitting a Base Learner

As a base learner we use a DecisionTree with max_depth = 1. See our first article in this article series to see the implementation of DecisionTree from scratch. We fit our base learner with the bootstrapped sample and then calculate the amount of say for that base learner as stated in SAMME algorithm.

    def _fit_base_learner(self, X_bootstrapped: np.array, y_bootstrapped: np.array) -> DecisionTree:
        """Trains a Decision Tree model with depth 1 and returns the model"""
        base_learner = DecisionTree(max_depth=1)
        base_learner.train(X_bootstrapped, y_bootstrapped)
        base_learner.amount_of_say = self._calculate_amount_of_say(base_learner, self.X_train, self.y_train)

        return base_learner

Calculating Sample Weights

After that we calculate sample weights to give more weight to samples which our base learner misclassified.

    def _calculate_sample_weights(self, base_learner: DecisionTree) -> np.array:
        """Calculates sample weights (see SAMME)"""
        preds = base_learner.predict(self.X_train)
        matches = (preds == self.y_train)
        not_matches = (~matches).astype(int)
        sample_weights = 1/self.X_train.shape[0] * np.exp(base_learner.amount_of_say*not_matches)
        # Normalize weights
        sample_weights = sample_weights / np.sum(sample_weights)

        return sample_weights

Creating Boostrap Sample

Now we use sample sizes which are calulcated in the previous step to create boostsrap samples. If you want to check how bootstrapping works, you can check the previous Random Forest article and see an example for how boostrapping works there. Here is the link for you 🙂

    def _update_dataset(self, sample_weights: np.array) -> tuple:
        """Creates bootstrapped samples w.r.t. sample weights"""
        n_samples = self.X_train.shape[0]
        bootstrap_indices = np.random.choice(n_samples, size=n_samples, replace=True, p=sample_weights)
        X_bootstrapped = self.X_train[bootstrap_indices]
        y_bootstrapped = self.y_train[bootstrap_indices] 

        return X_bootstrapped, y_bootstrapped

Making Prediction

After training we have a list of base learners in our hands which are very powerful when used together. Therefore, in the predict_proba function we will calculate our predictions for given data. The algorithm we use here is simple. Here are the steps

Make prediction with each base learner
Multiply the predictions with the amount of say for that base learner
Calculate the average of the predictions of the base learners.

    def _predict_scores_w_base_learners(self,  X: np.array) -> list:
        """
        Creates list of predictions for all base learners
        """
        pred_scores = np.zeros(shape=(self.n_base_learner, X.shape[0], self.label_count))
        for idx, base_learner in enumerate(self.base_learner_list):
            pred_probs = base_learner.predict_proba(X)
            pred_scores[idx] = pred_probs*base_learner.amount_of_say

        return pred_scores

    def predict_proba(self, X: np.array) -> np.array:
        """Returns the predicted probs for a given data set"""

        pred_probs = []
        base_learners_pred_scores = self._predict_scores_w_base_learners(X)

        # Take the avg scores and turn them to probabilities
        avg_base_learners_pred_scores = np.mean(base_learners_pred_scores, axis=0)
        column_sums = np.sum(avg_base_learners_pred_scores, axis=1)
        pred_probs = avg_base_learners_pred_scores / column_sums[:, np.newaxis]     
        
        return pred_probs

That’s all. Now we have our AdaBoostClassifier model which is coded by us from scratch! 🕺 💃

Performance of the AdaBoost Classifier

Our AdaBoost model which is implemented from scratch is compared with sklearn AdaBoostClassifier model on 3 different datasets with same number of base learners (50). The results shows that our model is more robust to overfitting than the Sklearn model thanks to our “boosting with resampling” implementation. Sklearn uses “boosting with reweighting” which we discussed the difference above.

See Table 1 below for details of performance comparison.

Conclusion

This was the last article for the Decision Tree article series. In this series I tried to build the foundations for tree models. With these foundations, understanding more advanced and complex algorithms related to Decision Trees becomes much easier. I hope the articles helped you. Thanks for reading!

Resources

Freund, Yoav, and Robert E. Schapire. “A decision-theoretic generalization of on-line learning and an application to boosting.” Journal of computer and system sciences 55.1 (1997): 119–139.
Hastie, Trevor, et al. “Multi-class adaboost.” Statistics and its Interface 2.3 (2009): 349–360.
Seiffert, Chris, et al. “Resampling or reweighting: A comparison of boosting implementations.” 2008 20th IEEE International Conference on Tools with Artificial Intelligence. Vol. 1. IEEE, 2008.

Thanks for Reading

Let me know if you have any comments 🙂