Random Forest
In the previous article part of the Supervised Learning Algorithms series, I discussed Decision Trees that fall under tree modeling. The Random Forest algorithm is also a type of tree modeling and is an extension of sorts of Decision Trees.
Essentially, a Random Forest is an ensemble of Decision Trees that have been trained by ensemble learning techniques, most commonly ‘bagging’.
I brushed over ensemble learning techniques in the Decision Trees article, but let’s go into some details now.
What is Ensemble Learning?
Ensemble Learning is the process of combining several base machine learning models together in order to arrive at one optimal machine learning model. The purpose of this is to increase accuracy and reduce the occurrence of errors in the predictions made.
A lot of questions and factors are involved when creating a Decision Tree such as, “which features should be used as condition nodes?”, “what should the order of the features be?” and “what should the threshold for each condition node be to arrive at an answer?”. Several different trees can be made for a single problem, each one having different accuracy scores. Ensemble learning allows us to combine all these different trees and their results to get the best possible tree for the problem.
While there are a couple of Ensemble Learning Techniques, Random Forests use one called ‘bagging’. To read more about Ensemble Learning, click here.
How does bagging work?
As mentioned in the Decision Tree article, the bagging technique breaks the main dataset into smaller random subsets, and fits a Decision Tree model on each one. The diagram below explains this process.
Ensemble Learning techniques are used to create a balance between bias and variance (which are explained here) which is how a better optimal model is guaranteed.
So, how does the Random Forest algorithm work?
The Random Forest algorithm uses the ensemble learning technique ‘bagging’ but with a slight alteration. As seen above, the bagging technique creates subsets of the main dataset. Therefore, all the decision trees based on these subsets still split the data based on similar features, and thus arrive at similar outputs. So, while the data is random, the features that the tree splits on are essentially the same.
Random Forests also start with subsets of the main dataset, however, they choose from a random selection of features. This ensures that each tree will split on different features, creating more randomness. Eventually when the aggregation of all the trees are taken, there will be diverse trees and so the ultimate model will be the most optimal onefor the problem at hand.
In simpler terms, Random Forest follows this procedure:
- The main dataset is broken into subsets a, b, and c (for example).
- A single decision tree is fitted to each subset a, b, and c, with splitting occurring on different features in each tree, Ta, Tb and Tc.
- The three decision trees Ta, Tb and Tc are aggregated and the final model Tf is produced. Tf is used to predict future outputs.
How is an output predicted based on new data?
When you have a single Decision Tree, a new data point follows the path of that tree from the root node to the following conditions nodes and ultimately to a leaf node which is the output for that data point. In the case of a Random Forest there are multiple Decision Trees, and so the new data point is sent down each of those trees until it arrives at a leaf node. This is when aggregation is performed, and the function of aggregation differs for classification and regression problems.
For Classification problems, the class most frequently arrived at by the data point is chosen as the final prediction.
For Regression problems, the average of all the values arrived at by the data point is assigned as the final prediction.
A Step Further
We started off with a dataset on which we fitted a Decision Tree. We then applied an Ensemble Learning Technique — bagging — on the tree, to which we added more diversity by fitting a Random Forest.
The Random Forest can be taken a step further by introducing more variation — having random thresholds for the random features in the random dataset. This is called Extremely Randomized Trees or Extra-Trees, and this model splits on random features with random threshold values.
To read more about Extremely Randomized Trees, click here.