Hyperparameter Tuning for Tree Models

Published in

ChiGa

5 min readOct 10, 2021

Previously, I built a simple decision tree to find out whether a particular customer would churn or not and what are the most important factors for a customer to exit the telecom service. For a quick reference please check out Building Simple Decision Tree and Feature Importance in Tree Model.

Before jumping to the Hyperparameter tuning discussion, it’s important to know the difference between model parameters and the hyperparameters

What are Model Parameters and Hyperparameters?

Model parameters are the values that are not set manually but are decided by the algorithm while learning from the training data. These are then stored internally for the model to produce predictions.

For a Linear Regression model (Sklearn’s LinearRegression), once we fit the train data set, the algorithm learn through it and returns the Coefficients of the independent features and the Intercept which can be accessed with linreg.coef_ and linreg.intercept_ attributes.

Hyperparameters are simply the parameters that we pass on to the learning algorithm to control the training of the model and estimate the model parameters. Hyperparameters are choices that the data modeler makes to ‘tune’ the operation of the learning algorithm. The choice of hyperparameters, therefore, has a lot of weight on the concluding model produced by the learning algorithm.

So anything that is given to the algorithm before it begins its learning process is a hyperparameter, i.e., these are the parameters that the user provides and not something that the algorithm learns on its own during the training process.

Again, For a Linear Regression model, we can pass on various hyperparameters like fit_intercept which upon boolean value would calculate the intercept, normalize with boolean value performs the normalization on the features, n_jobs is defined as the number of jobs to use for the computation.

Why Hyperparameter tuning is so important?

After handling all the data issues, doing feature engineering, and feature selection we come to accuracy and it does not improve after a certain threshold, sometimes the algorithm even overfits on the train data. Thus governing how an algorithm learns through data we can improve this accuracy to a bit higher point.

Thus, Hyperparameter tuning is one of the crucial tasks in machine learning model-building steps.

“A good choice of hyperparameters can really make an algorithm shine”

Hyperparameters regulate the learning process of the machine learning model. Tuning hyperparameters with the best set of parameters gives the optimized model parameters and thus improves overall performance. Important to note that the hyperparameters are constant in the learning process but the model parameters may change with data or features.

However, machine learning algorithms have a lot of hyperparameters and it becomes quite difficult to find the right set of parameters manually.

Different techniques to find optimal Hyperparameters

Grid Search
Randomized Grid Search
Bayesian Optimizer

In this article, we will focus on GridSearch and RandomizedGridSearch for tuning hyperparameters of the Tree model.

Hyperparameters of Decision Tree

Sci-kit learn’s Decision Tree classifier algorithm has a lot of hyperparameters.

criterion : Decides the measure of the quality of a split based on criteria like “gini” for the Gini impurity and “entropy” for the information gain.
max_depth : The maximum depth of the tree, the more depth of tree generally it overfits the data.
max_leaf_nodes : Grows the tree with a specified number of leaf nodes based on reduction in impurity.
min_samples_split : The minimum number of samples that are required to split an internal node.
min_samples_leaf : The minimum number of samples that are required to be at a leaf node of the tree.

For Tree models, usually, lowering the values of max_ parameters and increasing values of min_ parameters results in a better and stable model.

Before jumping to find out the best hyperparameters, let’s have quick look at our baseline decision tree’s overall performance.

We can see that our model suffered severe overfitting that it had 95.9% accuracy on the Train but had just 71% on Test.

Let’s do the hyperparameter tuning with GridSearch CV

GridSearch CV

GridSearch CV leverages the Cross-Validation technique to find out the best set of hyperparameters which gives the highest cross-validation score. We just need to provide the list of hyperparameters and their values, and it tries out all the possible combinations while performing the cross-validation.

First, let’s instantiate GridSearchCV from Sklearn’s model selection

Now, Let’s define a decision tree model and set of parameters

Next, We’ll fit the train data to this GridSearchCV model

It took around 12 seconds to go through all the combinations of hyperparameters we asked it to. Let’s look at the best set and the best score

So cross-validation score is around 79.2 and it found {‘criterion’: ‘entropy’, ‘max_depth’: 7, ‘min_samples_leaf’: 20, ‘min_samples_split’: 8} to be the best parameters from the specified set.

Let’s look at overall model performance.

Hurray! Our model has 80.4% accuracy on the train set and 78.75% accuracy on the test set. Thus it has overcome the overfitting and it seems to be a stable model now.

Let’s try out the RandomizedSearch CV as well.

RandomizedSearch CV

GridSearch CV is cool, but it tries all the combinations of hyperparameters provided as param_grid on the training dataset to find out the optimal set of hyperparameters. On the other hand, RandomizedSearch CV selects the random combinations to train the model and score. Although, this does skimp on some accuracy but saves training time significantly.

Firstly, let’s instantiate the RandomizedSearchCV from Sklearn’s model selection.

Now, let’s define a model and set of hyperparameters to try out.

Let’s fit this model on the train set.

Cool, it took around 12 seconds to find out the optimal set of hyperparameters. let’s check the best parameters and the best score.

Also, let’s have quick look at overall performance.

It is the similar overall performance that GridSearch gave us. Though the difference in the training time was close for this case however for the bigger datasets RandomizedSearch CV is a better option to go with.

GridSearch or RandomizedSearch?

As we saw, RandomizedSearch CV can save on training time where GridSearch would give parameters of higher performance (here accuracy). Thus, RandomizedSearch is great for coarse tuning with a large number of hyperparameters and then we can fine-tune it with probably a small number of hyperparameters with GridSeach. Although, it depends on the size of the data and the time at hand.

Thus hyperparameter tuning allows us to unleash the true potential of a machine learning model and helps to regularize it and avoid the possible overfitting.

That’s a lot of Hyperparameters (33 to be exact in this article :p)

Reference

Thanks for reading, Please check out my work on my GitHub profile and do give it ★ if you find it useful!