Hyperparameter Optimization using bayesian optimization

How to find the best hyperparameters for your machine learning model without losing your mind

Published in

spikelab

5 min readJan 19, 2019

Hyperparameters Optimization

Choosing the right parameters for your machine learning algorithm is a hard and crucial task, since it can make a big difference on the performance of a model. These parameters can be tuned in a manual or automatic way.

The manual way implies training and testing models, manually changing the parameters at each step. This could end up being a time consuming task and maybe you’ll never find the optimal parameters. On the other hand, we could use algorithms that start with a potential set of hyperparameters, and try to optimize them automatically.

Almost all machine learning libraries and frameworks include some automatic hyperparameters optimization algorithms, Here, we’ll talk about two of these: RandomSearch and GridSearch.

In a GridSearch, the key idea is to define a set of parameter values and train the model for all possible combinations and then save the best one. This method is pretty good if you have a simple model, but if your model takes some time to train (like in almost all Deep Learning models) or if your hyperparameter space is too big, this approach could not be the best one, because of the time required to do it.

The RandomSearch algorithm is pretty similar, but instead of using all possible combinations, it randomly assigns a value (within a defined range) for each hyperparameter, so the required time could decrease significantly. However, it might not find the optimal set.

Bayesian Optimization is an alternative way to efficiently get the best hyperparameters for your model, and we’ll talk about this next.

Bayesian Optimization

As Fernando Nogueira explains in his amazing python package bayesian-optimization:

Bayesian optimization works by constructing a posterior distribution of functions (gaussian process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not

We can see this in the image below

As you iterate over and over, the algorithm balances its needs of exploration and exploitation taking into account what it knows about the target function. At each step a Gaussian Process is fitted to the known samples (points previously explored), and the posterior distribution, combined with a exploration strategy (such as UCB (Upper Confidence Bound), or EI (Expected Improvement)), are used to determine the next point that should be explored.

Using Bayesian Optimization, we can explore the parameter space in a smarter way, and thus reduce the time required to do this process.

You can learn more about Bayesian Optimization here

Using Bayesian Optimization with H2O.ai

I’m going to use H2O.ai and the python package bayesian-optimization developed by Fernando Nogueira. The goal is to optimize the hyperparameters of a regression model using GBM as our machine learning algorithm.

Show me the data!

I chose the red wine quality dataset on Kaggle because it’s a simple dataset (and I live in Chile, we love wine!!) which you can use to train regression or classification models. I always use it when I want to play with a new machine learning algorithm.

The dataset contains a set of features that determine the quality of wine like: pH, citric acidity, sulphates, alcohol, etc. The data looks like this:

So our model will try to predict the quality of the wine.

Let’s go to the code

First, import h2o and bayesian-optimization, then start a H2O’s server:

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from bayes_opt import BayesianOptimization
h2o.init()
h2o.remove_all()

Let’s load our dataset into a H2O’s frame, we are going to split our dataset into train and test, 70% will be used to train. Internally H2O’s uses cross-validation. So we can save the test data just to validate our final model.

Our target will be the quality of the wine.

data = h2o.upload_file(“data/winequality-red.csv”)
train_cols = [x for x in data.col_names if x not in [‘quality’]]
target = "quality"
train, test = data.split_frame(ratios=[0.7])

The bayesian optimization package requires a function to optimize, and this function must return a number. In our case this number will be the metric (or cost function) that we want to minimize. I choose to minimize the root mean squared error (remember, we are going to train a regression model), so the function returns this value.

def train_model(max_depth, 
                ntrees,
                min_rows, 
                learn_rate, 
                sample_rate, 
                col_sample_rate):
    params = {
        'max_depth': int(max_depth),
        'ntrees': int(ntrees),
        'min_rows': int(min_rows),
        'learn_rate':learn_rate,
        'sample_rate':sample_rate,
        'col_sample_rate':col_sample_rate
    }
    model = H2OGradientBoostingEstimator(nfolds=5,**params)
    model.train(x=train_cols, y=target, training_frame=train)
    return -model.rmse()

The function returns -model.rmse() because, as we will see soon, the optimizer by default is designed to maximize functions.

Now, we have to define the parameter space:

bounds = {
    'max_depth':(5,10),
    'ntrees': (100,500),
    'min_rows':(10,30),
    'learn_rate':(0.001, 0.01),
    'sample_rate':(0.5,0.8),
    'col_sample_rate':(0.5,0.8)
}

With this done, it’s time to define our optimizer. This receives a python function and the hyperparameter space. Then we can set the number of initial points, and how many iterations we want. The number of iterations will be equal to how many models we are going to train.

optimizer = BayesianOptimization(
    f=train_model,
    pbounds=bounds,
    random_state=1,
)
optimizer.maximize(init_points=10, n_iter=50)

Runing this, we get an output at each step:

Finally we can get the best hyperparameters for our model

optimizer.max{'target': -0.35322220505969215,
 'params': {'col_sample_rate': 0.8,
  'learn_rate': 0.01,
  'max_depth': 10.0,
  'min_rows': 10.0,
  'ntrees': 300.0,
  'sample_rate': 0.8}}

Conclusions

We can use Bayesian Optimization for efficiently tuning hyperparameters of our model. As we saw in our example, this just involves defining a few helper functions. We considered a machine learning example, but Bayesian Optimization can be used to optimize a wide variety of black box problems.

We can integrate the package developed by Fernando Nogueira with almost all popular machine learning libraries like h2o, sklearn, tensorflow, XGboost, CatBoost, etc.

You can find the full example here in GitHub