Random Forest cheat sheet
- We have seen in decision tree topic that decision tree tends to overfit a lot that is we use RF.
- Every forest is made up of trees and RF is no exception for that cause RF is made up of DTs.
- Works very well than many other ML algos.
- RF follows Ensemble technique from which it uses Bagging.
- Bagging is nothing but the Bootstrap Aggregation. B from Bootstrap Agg from Aggregation.
- Easy to Parallelize
- OOB error and OOB Score : how our model behaves against testing data.
- Typically 2/3rd of data is row sampled.
- Typically root(features) or log2(features) sampled in feature sampling.
- Hard Voting and Soft Voting. By default SK Learn uses hard voting.
- For Ensemble 2 conditions should be satisfied to accept the Ensembled model:
Diversity : Model should be diverse.
Acceptability : Model should be acceptable enough.
- For both regression and classification.
- For regression it uses median or mean.SK Learn by default uses Mean.
- In DT the model suppresses one of the attribute but, by feature sampling every feature gets equal importance.
- In hyperparameter tuning it comprises of n_estimators' along with all DT hyperparameters.
- It reduces Overfitting of DTs.
- Doesn’t affect by outliers.
- Feature Scaling is not required.
- It improves the testing accuracy.
- Regression and classification both.
- Doesn’t suppress the attribute like DTs.
- Easy to parallelize
- Works well on high dimensional data.
- More Computation req.
- More Time Req.
- Black Box Model
- You can’t explain its mathematical intuitions in layman’s language.
- Highly Complex