An Intro to Hyper-parameter Optimization using Grid Search and Random Search

Elyse Lee
7 min readJun 5, 2019
Image Source: Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281–305 (2012)

Objective

Hyper-parameter Optimization

Grid Search

Random Search

Example using GridSearchCV and RandomSearchCV

What is Hyper-Parameter Optimization?

In machine learning, different models are tested and hyperparameters are tuned to get better predictions. Choosing the best model and hyperparameters are challenges that must be solved for improvements in predictions. Hyperparameters are specified parameters that can control a machine learning algorithm’s behavior by tuning. They are different from parameters in that hyperparameters are parameters set before training and supplied to the model while parameters are values that are learnt during training by the machine. Hyperparameters are tuned by choosing the optimal parameter values for higher accuracy. This process can be difficult and time consuming, but there are tools available to make this process easier such as Random Search and Grid Search.

The two different methods that will be explored in this article for hyperparameter optimization are Random Search and Grid Search. These two methods make the process of hyperparameter optimization easier as they sort through different combinations of parameters and hyperparameters to output the best combination of values.

Random Search

As its name suggests, Random Search uses random combinations of hyperparameters. This means that not all of the parameter values are tried, and instead, parameters will be sampled with fixed numbers of iterations given by n_iter. A guide to scikit learn’s Random Search on hyperparameters can be found in the following link: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html.

Grid Search

Also known as an exhaustive search, Grid Search looks through each combination of hyperparameters. This means that every combination of specified hyperparameter values will be tried. A guide to scikit learn’s Grid Search on hyperparameters can be found in the following link:https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV

Random Search vs Grid Search

Image 1

Random Search would be advised to use over Grid Search when the searching space is high meaning that there are more than 3 dimensions as Random Search is able to explore a wider hyperparameter space. When there are many parameters, Random Search could be preferred as too many parameters will increase time complexity for Grid Search. However, though Grid Search can be very computationally expensive, as an exhaustive search, it’s useful for looking through all the combinations of specified hyperparameters.

As seen from Image 1, comparing the two approaches by showing the hyperparameter space, the random layout shows that Random Search explores the space more widely.

Example

To exemplify these methods, a dataset grabbed from Kaggle, “House Prices: Advanced Regression Techniques” from https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview, will be used. This dataset looks to predict sales price, but the details are not important to explain the topic for this article.

Tested Models

The models that will be tested on this dataset are Ridge Regression, Random Forest Regression, and Gradient Boost Regression. For choosing the best model, other models should be looked at, but in this case, these three will be used for the example.

A brief definition of the different models will be given below, but for a more comprehensive understanding, they should be each be researched thoroughly:

Ridge regression creates a model with optimal parsimony. This model performs L2 regularization by adding an L2 penalty with value of square of the coefficient size. As the coefficients are shrunk by using the same factor, no coefficients are eliminated.

Random Forest Regression creates an additive model by forming a combination of decisions using a sequence of base models. Using multiple models is called model ensembling and a Random Forest’s final model is the sum of multiple smaller independent models.

Gradient Boost Regression is an additive model that is an ensemble of weak prediction models that are typically decision trees. This ensemble of weak trees is able to become powerful as each successive tree learns and improves on the previous tree.

Using Random Search

Firstly, as Random Search takes less processing time than Grid Search, the model to be analyzed further will be selected using Random Search score from amongst three different models: Random Forest, Ridge, and Gradient Boost. Another way to select the best model would’ve been to use the mean cross validation scores. Here, the training dataset was used to select the best model.

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor
num_estimators = [500,1000]
learn_rates = [0.02, 0.05]
max_depths = [1, 2]
min_samples_leaf = [5,10]
min_samples_split = [5,10]

param_grid = {'n_estimators': num_estimators,
'learning_rate': learn_rates,
'max_depth': max_depths,
'min_samples_leaf': min_samples_leaf,
'min_samples_split': min_samples_split}

random_search =RandomizedSearchCV(GradientBoostingRegressor(loss='huber'), param_grid, random_state=1, n_iter=100, cv=5, verbose=0, n_jobs=-1)

random_search.fit(x_train, y_train)

A Random Search was used for each of the model. Here, the above code snippet shows the Random Search performed on the Gradient Boost Regression. The meaning of each parameter values in the param_grid will be given below under Using Grid Search.

The best parameter values for the Gradient Boost Random Search is given below:

random_search.best_params_{'learning_rate': 0.05,
'max_depth': 2,
'min_samples_leaf': 5,
'min_samples_split': 10,
'n_estimators': 1000}

Next, the random search score was found:

gboost_score=random_search.score(x_train,y_train)
print(gboost_score)
0.9593680864100839
Image 2

Random Search was done for each of the model, and the scores were then compared. As seen in the graph above (Image 2), it showed that Gradient Boost had the highest cross validation score while Random Forest gave the lowest score (the cross validation score gives an estimate of the model performance). Hence, Gradient Boost was selected as the best model amongst these three. The hyperparameters were then tuned using the more computationally expensive Grid Search.

Using Grid Search

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

num_estimators = [500,1000]
learn_rates = [0.02, 0.05]
max_depths = [1, 2]
min_samples_leaf = [5,10]
min_samples_split = [5,10]

param_grid = {'num_estimators': num_estimators,
'learning_rate': learn_rates,
'max_depth': max_depths,
'min_samples_leaf': min_samples_leaf,
'min_samples_split': min_samples_split}

grid_search = GridSearchCV(GradientBoostingRegressor(loss='huber'),
param_grid, cv=3, return_train_score=True)
grid_search.fit(x_train, y_train)

The grid of parameter values that were specified are shown in the code above. More parameter values can be specified, but the lesser values that are given, the faster the search will be, so to minimize time complexity, only two values were given for each parameter in this example.

Grid of Parameter Values:

Num_estimators: This is the number of trees in the forest. A higher number of trees can be better for learning the data, but higher numbers can increase the time it takes for the training process.

Learn_rates: This is the rate at which it controls the weighting of added new trees.

Max_depths: This is the maximum tree depth. The depth of tree relates to how much information of the data is captured, so a deeper depth captures more information.

Min_samples_leaf: This is the minimum required number of samples to be at a leaf node.

Min_samples_split: This is the minimum required number of samples to split an internal node.

The best parameter values were then shown below:

grid_search.best_params_{'learning_rate': 0.05,
'max_depth': 2,
'min_samples_leaf': 5,
'min_samples_split': 5,
'num_estimators': 1000}

As mentioned, since only two values were given, Grid Search could only choose one out of the two values specified for each parameter setting. Giving more values might yield different best parameter values and improve the score.

Next, the Grid Search score for the Gradient Boost model was outputted.

grid_search.score(x_train, y_train)0.9594164577940154

For a model like Gradient Boost, it is important to note that the performance of this model has a high dependence on hyperparameter tuning. For example, if the num_estimators values were given 5 and 10 instead of 500 and 1000, the grid_search.best_params_ changes to:

{'learning_rate': 0.05,
'max_depth': 2,
'min_samples_leaf': 5,
'min_samples_split': 5,
'num_estimators': 10}

The Grid Search score for this Gradient Boost model with 10 as num_estimators then gives a score of:

grid_search.score(x_train, y_train)0.40309241636365023

From 0.9594164577940154 to 0.40309241636365023, that’s a huge difference of 0.55632404143! This shows that the specified values for hyperparameter values can have a high impact on the performance of the model.

Image 3

The Grid Search and the Random Search cross validation scores were compared in the above graph (Image 3). As shown, though only by a small amount, the Grid Search score is higher than Random Search, so the best parameter values given by Grid Search will be used as the optimal hyperparameter values for the Gradient Boost model.

In this example, it showed that the Grid Search yielded a higher score than Random Search. However, it is important to note that there are arguments surrounding that Grid Search is not necessarily better than Random Search. There are studies that show that Random Search can give equal or better performance when given a good computational budget for searching through a larger configuration space. One such study can be found at: https://dl.acm.org/citation.cfm?id=2188395 where it analyzes “Random Search for Hyper-parameter Optimization”.

Conclusion

Tuning hyperparameters is one of the most tricky parts of building a machine learning model, but is a necessary challenge to gain better predictions. Fortunately, two widely used hyperparameter tuning methods, Grid Search and Random Search, help in efficiency by automating the process in choosing the best parameter values for a better model performance. There are arguments supporting one method over the other for superior performance, but choosing one of these methods can depend on the specific problem such as with Random Search being preferred for higher searching space and lesser time complexity. Overall, tuning the hyperparameters and setting different values can provide significant difference in attaining better predictions.

Key Terms

Hyperparameters are specified parameters that are tuned to control a machine learning algorithm’s behavior with its values set prior to the learning process.

Parameters are configuration variables that are learned by the machine .

Grid Search is an exhaustive search that looks through all combinations of hyperparameters.

Random Search uses random combinations of hyperparameters by sampling parameters with a fixed number of iterations.

--

--