Ensemble Voting Classifiers and Random Forests in Sci-kit Learn

Pratap R Jujjavarapu
Analytics Vidhya
Published in
4 min readSep 27, 2020
Photo by Mathias Lövström on Unsplash

Let us say you pose a complex question to a set of random people and collect the answers in to a data set. Now, you aggregate (take average of) all the answers present in the dataset , Many a time, your aggregate answer is either close or better compared to an expert’s solution to the complex problem. This principle is called wisdom of the crowd.

The above mentioned principle applies to machine learning in order to have a better model performance in terms of accuracy and sustainability. We generally have a group of predictors (classifiers/regressors) which forms an ensemble, that yields a better prediction when compared to the individual best predictor. This technique is called as ensemble learning and the algorithm that is used to predict the aggregate of the ensemble is called as ensemble method. If an ensemble contains only the decision tree classifiers/regressors we call it a Random Forest. In this article we discuss about Hard, Soft Voting used to achieve higher accuracy together with Random Forests as an ensemble.

How does the ensemble method choose the winning class?

There are two methods to determine the winning class in the ensemble learning by aggregating the predictions of each classifier in the ensemble and predict the class that gets the most votes. This is called Hard voting. To generalize a group of weak learners in an ensemble could also provide a strong learner, when there are (weak learners)in sufficient number with high diversity.

Image Source: Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow — Aurelien Geron

If the classifiers in ensemble learning are able to predict probability (using predict_proba() method , for example Decision Trees in the ensemble), then the class with highest class probability , averaged over all the classifiers, is the winning class. This method is called Soft voting. Both Hard voting and soft voting can be done using scikit -learn’s VotingClassifier.

To illustrate Voting classifier , let us take make_moons dataset which is a pre-defined dataset in sklearn. we take a total samples of 500 (n_samples) and a standard deviation of 30% (noise = 0.30) is added to the data.

If the parameter voting in VotingClassifier is set to “soft”

we get 91.2% as the accuracy for soft voting that predict the aggregate of class probabilities and 89.6% of accuracy for hard voting. This is because, soft voting takes the uncertainties of the classifiers in the final decision. Depending on the classifiers in the ensemble we use either Hard/Soft voting to predict with higher accuracy.

Random Forests as an ensemble

As we discussed , and ensemble of decision trees is called a Random forest. They are generally trained by Bagging (training all the decision tree classifiers on different training subsets , with samples selected at random with replacement)methods(or sometimes pasting (samples selected at random without replacement)) typically max_samples parameter is set to size of the training set. We can use RandomForestClassifer class in Scikit -learn for Classification and RandomForestRegressor for regression tasks. Let us look into the code explaining some parameters that can be passed into RandomForestClassifer.

RandomForestClassifier(bootstrap=True, class_weight=None, criterion=’gini’, max_depth=None, max_features=’auto’, max_leaf_nodes=16, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1, oob_score=False, random_state=42, verbose=0, warm_start=False)

bootstrap is set to True to implement Bagging and False to implement pasting. Crierion, min_sample_leaf,min_sample_split,max_depth are already explained in my previous article of Decision Tree classifier.n_estimators are set to determine number of trees to be build.n_jobs is the parameter to decide on how many CPU cores does the algorithm shall run(If set to ‘-1' , the algorithm will run on al the available CPU cores).Averaging more trees will yield a more robust ensemble by reducing overfitting. max_features determines how random each tree is, and a smaller max_features reduces overfitting. In general, it’s a good rule of thumb to use the default values: max_features=sqrt(n_features) [Auto]for classification and max_features=log2(n_features) for regression. oob_score (out of bag score) is set to either True/False whether to use out-of-bag samples to estimate the generalization accuracy from the samples that are not ‘bagged’ before considering to use the test set for validation. oob_decsion_function returns a 2D array that predicts the probabilities of classes for each training instance that are used in training the model.

References:

  1. Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow — Aurelien Geron
  2. Introduction to Machine Learning with Python — Andreas C. Müller & Sarah Guido

--

--