I was recently working on a Market Mix Model, wherein I had to predict sales from impressions. While working on an aspect of it I was confronted with the problem of choosing between a Random Forest and a XG Boost. This led to the inception of this article.
Before we get down to the arguments in favor of any of the algorithms, let us understand the underlying idea behind the two algorithms in brief.
The term gradient boosting consists of two sub-terms, gradient and boosting. Gradient boosting re-defines boosting as a numerical optimization problem where the objective is to minimize the loss function of the model by adding weak learners using gradient descent. Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. As gradient boosting is based on minimizing a loss function, different types of loss functions can be used resulting in a flexible technique that can be applied to regression, multi-class classification, etc.
Gradient boosting does not modify the sample distribution as weak learners train on the remaining residual errors of a strong learner (i.e., pseudo-residuals). By training on the residuals of the model, it gives more importance to misclassified observations. Intuitively, new weak learners are added to concentrate on the areas where the existing learners are performing poorly. The contribution of each weak learner to the final prediction is based on a gradient optimization process to minimize the overall error of the strong learner.
Random Forest is a bagging technique that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.” Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
A Random Forest has two random elements —
1. Random subset of features.
2. Bootstrap Samples of data.
Comparing the Contenders
Boosting happens to be iterative learning which means the model will predict something initially and self analyses its mistakes as a predictive toiler and give more weightage to the data points in which it made a wrong prediction in the next iteration. After the second iteration, it again self analyses its wrong predictions and gives more weightage to the data points which are predicted as wrong in the next iteration. This process continues as a cycle. Hence technically, if a prediction has been done, there is an at most surety that it did not happen as a random chance but with a thorough understanding and patterns in the data. Such a model that prevents the occurrences of predictions with a random chance is trustable most of the time.
Random forest is just a collection of trees in which each of them gives a prediction and finally, we collect the outputs from all the trees and considers the mean, median, or mode of this collection as the prediction of this forest depending upon the nature of data (either continues or categorical). At a high level, this seems to be fine but there are high chances that most of the trees could have made predictions with some random chances since each of the trees had their own circumstances like class imbalance, sample duplication, overfitting, inappropriate node splitting, etc.
Let us now score these two algorithms based on the below arguments.
XGBoost (1) & Random Forest (0):
XGBoost straight away prunes the tree with a score called “Similarity score” before entering into the actual modeling purposes. It considers the “Gain” of a node as the difference between the similarity score of the node and the similarity score of the children. If the gain from a node is found to be minimal then it just stops constructing the tree to a greater depth which can overcome the challenge of overfitting to a great extend. Meanwhile, the Random forest might probably overfit the data if the majority of the trees in the forest are provided with similar samples. If the trees are completely grown ones then the model will collapse once the test data is introduced. Therefore, major consideration is given to distributing all the elementary units of the sample with approximately equal participation to all trees.
XGBoost (2) & Random Forest (0):
XGBoost is a good option for unbalanced datasets but we cannot trust random forest in these types of cases. In applications like forgery or fraud detection, the classes will be almost certainly imbalanced where the number of authentic transactions will be huge when compared with unauthentic transactions. In XGBoost, when the model fails to predict the anomaly for the first time, it gives more preferences and weightage to it in the upcoming iterations thereby increasing its ability to predict the class with low participation; but we cannot assure that random forest will treat the class imbalance with a proper process.
XGBoost (3) & Random Forest (0):
One of the most important differences between XG Boost and Random forest is that the XGBoost always gives more importance to functional space when reducing the cost of a model while Random Forest tries to give more preferences to hyperparameters to optimize the model. A small change in the hyperparameter will affect almost all trees in the forest which can alter the prediction. Also, this is not a good approach when we expect test data with so many variations in real-time with a pre-defined mindset of hyperparameters for the whole forest but XG boost hyperparameters are applied to only one tree at the beginning which is expected to adjust itself in an efficient manner when iterations progress. Also, the XGBoost needs only a very low number of initial hyperparameters (shrinkage parameter, depth of the tree, number of trees) when compared with the Random forest.
XGBoost (4) & Random Forest (0):
When the model is encountered with a categorical variable with a different number of classes then there lies a possibility that Random forest may give more preferences to the class with more participation.
XGBoost (5) & Random Forest (0):
XGBoost may more preferable in situations like Poisson regression, rank regression, etc. This is because trees are derived by optimizing an objective function.
XGBoost (5) & Random Forest (1):
Random forests are easier to tune than Boosting algorithms.
XGBoost (5) & Random Forest (2):
Random forests easily adapt to distributed computing than Boosting algorithms.
XGBoost (5) & Random Forest (3):
Random forests will not overfit almost certainly if the data is neatly pre-processed and cleaned unless similar samples are repeatedly given to the majority of trees.
The winner of this argument is XGBoost!
Disclaimer: These are my personal views. These views are independent of the fact that the choice of an algorithm hugely depends on the data at hand as well.
Thanks for reading! Stay Safe!