Add power to your model with AdaBoost Algorithm
Adaptive Boosting algorithm commonly known as AdaBoost algorithm is a type of ensemble boosting learning technique made on top of decision trees or, we can say with the help of decision trees .
Since , decision trees are also used for building random forest model so we need to know the difference between adaboost and random forests model : →
- In random forest , we make different decision trees from each sub-part of our dataset by dividing our dataset into n-equal parts where, some trees will might be bigger than others but there is no predetermined length fixed for size of the decision trees. In contrast , the forest of trees made with AdaBoost , each tree comprise one parent node and two leaf nodes and these type of trees are called as stump .
- Each tree has an equal vote on the final classification for random forests . whereas in forests of stumps made with adaboost some stumps get more say or more vote than other stumps during final classification .
- In random forest , each tree is made independently and doesn’t influence the other. In contrast , the forests of stumps made in adaBoost order is important because the errors that the 1st tree make influence the 2nd tree and the errors that the 2nd tree make influence the 3rd tree etc . , and so on .
Where the AdaBoost algorithm is used ? →
AdaBoost algorithm is used in the place where there is complex relationship between the various features( independent variables) of the dataset and its target variable( dependent variables).
How AdaBoost algorithm works ? →
To understand this we will use this dataset for classification of having heart disease or not for a person from a particular place having a particular income and with reference to their gender .
The First thing we will do that we will add a column “WEIGHTS” corresponding to each data point in the table which will indicate the importance of that corresponding data point for being correctly classified or not . As there are 8 rows in our dataset so we will assign the weight 1/8 for each data point in the “WEIGHTS” column. So , every time when we start we will assign a weight of 1/(total no. of samples in the dataset) to every point in the dataset which shows all the data points are equally important at the start .
Now we need to construct a forests of stumps with the help of gini co-efficient by constructing a bunch of stumps representing the no. of correctly and incorrectly classified data points with respect to each independent column
Gini Coefficient of Dallas City = 1 -(2/(2+2))² -(2/(2+2))² = 0.5
Gini Coefficient of New York City = 1 -(1/(1+3))² -(3/(1+3))² = 0.375
Gini Coefficient for City Column = 0.5 * ((2+2) / (2+2+1+3))+ 0.375 *((1+3) / (2+2+1+3)) = 0.437
Similarly , according to the City column the Gini Index of Gender column = 0.48 * ((2+3) / (2+3+1+2))+ 0.375 *((1+2) / (2+3+1+2)) = 0.44
For a column , having only numerical values there is different approach for constructing the stumps . First we will keep the independent and dependent column side by side then we will sort the independent numerical column.
Then for each two rows of the independent values in the column we have to calculate its average values . After getting those values , we have to search for the value which returns the lowest Gini index and that will represent stump for the whole column .
Gini Index for 40945.5 value= 0.48 * ((4+3) / (0+1+4+3))= 0.42
Gini Index for 43948.5 value= 0.5 * ((1+1) / (1+1+2+4))+ 0.44 *((2+4) / (1+1+2+4)) = 0.455
Gini Index for 51509 value= 0.11 * ((2+1) / (2+1+1+4))+ 0.24125 *((1+4) / (2+1+1+4)) = 0.19
Gini Index for 77370.5 value= 0.5 * ((2+2) / (2+2+1+3))+ 0.375 *((1+3) / (2+2+1+3)) = 0.4375
Gini Index for 99379 value= 0.48 * ((2+3) / (2+3+1+2))+ 0.44 *((1+2) / (2+3+1+2)) = 0.465
Gini Index for 101375.5 value= 0.44 * ((2+4) / (2+4+1+1))+ 0.5 *((1+1) / (2+4+1+1)) = 0.455
Gini Index for 109676 value= 0.408 * (2+5) / (2+5+1+0)) = 0.357
After we calculated the Gini Index with respect to each column , we found that , Gini Index for 51509 value is 0.19 , which is lowest with respect to other values which makes 51509 dividing parameter for Income Column. And after that , we will again calculate the Amount of say each stump gets through it’s no. of incorrectly or , miss-classified classes . The formula for calculating the Amount of say for each stump is : →
where , Total Error is equal to sum of the weights of the no. of incorrect predictions made by the independent column , e.g. Total error for City Column = no. of incorrect predictions for City column = 5 = sum of the weights of each incorrect prediction = 1/8 + 1/8 + 1/8 + 1/8 + 1/8 = 5/8
Amount of Say for City Column is : (1/2)*log((1- (5/8))/(5/8)) = -0.11
Amount of Say for Gender Column is : (1/2)*log((1- (5/8))/(5/8)) = -0.11
Amount of Say for Income Column is : (1/2)*log((1- (2/8))/(2/8))= 0.23
Thus , we get to know how the sample weights for the incorrectly classified data points are used to determine the amount of say each stump gets . Now , we will start modifying the weights for each sample . So , we will 1st modify ( i. e. we will increase )the weights of the data points which are incorrectly classified by the column stump having lowest Gini Index and then decrease the weights of the data points in that column that are correctly classified. We will modify the weights by using the formula : →
Thereby , we will assign a new column to the dataset “NEW WEIGHTS” that will act as modified weights to replace the “WEIGHTS” column in the dataset. Then , we will start assigning new weights w.r.t. to each data point in the table . The data points which are incorrectly classified will get increased in modified weights column whereas data points which are correctly classified will get decreased in modified weights column. Now , we will normalize the new sample weights by dividing each data point with the sum of the whole “NEW WEIGHTS” column and just transferring the normalized sample weight to ‘NEW WEIGHTS’ column as they will contribute for the next stump.
So , in our dataset income with 51509 value having the lowest gini index 0.19 among all other features will be used for classifying the data that the patient has a heart disease or not .
As there are 2 ( 1 +1 ) incorrect predictions , so the weights of those column will be increased to (1/2)*e^(0.23) = 0.629 and for the 6 ( 2 + 4 )correct prediction the new weights will be decreased to (1/2) * e^ -(0.397) and hence the new dataset will be:-
Now we , need to normalize the new weights by dividing each new weights by the sum of the whole column which is equal to (0.629+0.397+0.397+ 0.397+0.397+0.397+0.397+0.629)= 3.64 . Thereby , 0.629/3.64 = 0.17 and 0.397/3.64 = 0.10 and the dataset looks like : →
After that , its the time that we will start constructing the 2nd stump( classifier ). But , before that we will make a new empty dataset that is of same size as of original dataset with same column names .
Next , we will have to start assigning row data to the blank dataset in the order that we have to take 8 random numbers in between 0 to 1 . According to each random no. we choose , if it is between 0 to 0.17(new weight of 1st row) then 1st row will be added to the blank dataset first or, if the random no. is between 0.17 to 0.27( 0.17 + 0.1(new weight of 2nd row) ) then 2nd row will be added to the blank dataset and similarly if the random is between 0.27 to 0.37 (0.27 + 0.1(new weight of 3rd row)) then 3rd row will be added to the blank dataset and respectively for other new weights new rows will be assigned to the blank dataset .
We must notice that the range for the incorrectly classified rows is more i.e. 0.17 ( e.g. 0 to 0.17 and 0.77 to 0.94 ) in comparison to correctly classified rows is less i.e. 0.1 ( e.g. 0.17 to 0.27 and 0.37 to 0.47 ) . So , it seems that the incorrectly classified rows will occur more no. of times than correctly classified rows in the new blank dataset and the income stump with 51509 value having the lowest gini index will be considered 1st weak learner for this dataset .
So , lets assume if we have the new updated dataset is like : →
Then , from this new dataset we will again have to construct a next weak classifier and continue the process until all the rows get correctly classified with their updated weights .
And when a new test dataset is passed through these weak decision tree stumps , the new test dataset will be assigned with the value where it gets maximum votes from the weak classifiers that the person is having a heart dataset or not with respect to its testing features because we are combining these weak learners and making it as a strong learner here.
This , is how the AdaBoost Algorithm works . I hope you have enjoyed reading this blog. If you have any comments, queries or questions, then please let me know in the comments section. Until then enjoy learning.

