Automating and Accelerating Hyperparameter Tuning for Deep Learning

Published in

Inside Machine learning

6 min readMar 12, 2019

Deep learning can be tedious work. Taking Long Short-Term Memory (LSTM) as an example, we have lots of hyperparameters, (learning rate, number of hidden units, batch size, and so on) waiting for us to find the best combination. Considering the size of a deep learning model, hyperparameter tuning usually takes long time.

One traditional way to automate the tuning process is grid search, trying every possible group of hyperparameters in the search space and remembering the best one, but this method is typically too computationally intensive. To overcome this issue, IBM developed a black-box optimization library called RbfOpt (https://github.com/coin-or/rbfopt) which is embedded in Experiment Builder.

In this blog, I’ll walk you through how to set up hyperparameter tuning with Experiment Builder on Watson Studio. Basically, we need to meet three requirements before we start our experiments:

Provisioning a Cloud Object Storage (COS) Instance
Provisioning a Watson Machine Learning (WML) Instance
Setting up a Training Definition

Service Set-up

To fulfill the first two requirements, you have to create an IBM cloud account (https://console.bluemix.net/) and instantiate your personal COS and WML services under ‘Catalog’:

Look for the ‘Lite’ version of these services, available at no cost.

Training Definition

The training definition is a zip file that includes one main python script to train models and save results. You can also zip other scripts or files if needed. After Experiment Builder is activated, it actually runs the main script. That’s why we need this training definition.

For the main script, you can follow the coding guidelines (https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml_dlaas_code_guidelines.html?audience=wdp) to write model training and evaluation functions and execute them in the main function. In addition, you’ll have to add and edit some parts of the code to do hyperparameter optimization (HPO). Let’s delve deeper into each part.

Specify directories and read data. You’ll see we need to specify two buckets (a source bucket and a result bucket) for input data and output results in the next section, so once you upload data into that source bucket, it’s pretty easy for your program to read them:

2. Read hyperparameters. Hyperparameters will be saved in a json file called ‘config.json’, within same directory as this script. In this example, I only explored learning rate and number of hidden units for LSTM:

Lastly, include three additional components. First, download ‘emetrics.py’ (https://raw.githubusercontent.com/pmservice/wml-sample-models/master/deep-learning/metrics/emetrics.py) and include it in your zip file. Remember to import it at the beginning of your main script.

What’s more, outside the main function, add a function called ‘getCurrentSubID’ and a class called ‘HPOMetrics’ that will pass metrics to your HPO program.

3. Train and evaluate. Let’s return to the main function. After you’ve set up all the data and hyperparameters set-up, you can call functions for training and evaluating:

Inside the ‘train_model’ function, to pass metrics to our HPO program after each epoch, we have to create an instance of HPOMetrics and include it in the model ‘callbacks’. After training finishes, don’t forget to close HPOMetrics:

If you want to save predictions or other results, you can write them directly into the result directory which is defined as ‘output_model_folder’ in the first part. Then you can go to your cloud account dashboard and enter the bucket you used for storing results to download those predictions or any metrics you want.

Start Experimenting!

We’ve got services and python scripts ready, so now we can start experimenting on hyperparameters.

To access the UI page of Experiment Builder on Watson Studio (https://dataplatform.cloud.ibm.com/), create a project based on the COS you just created, and associate the WML instance with this project under ‘Settings’:

Next, return to the project ‘Assets’ page and you’ll see ‘Experiments’.

Clicking ‘New experiment +’ and select the services you just associated with this project. Technically, COS is used for getting data, storing logs and results. WML can be a place to save and deploy your model (which this blog doesn’t cover). For Experiment Builder, we usually create a new bucket in COS. The source COS bucket stores all data your python scripts need. Results COS bucket stores all logs and results. I usually use the same bucket for these two purposes, but you can also just have two separate buckets.

Click the source bucket (marked by a red rectangle) and you’ll go to your Bluemix dashboard, then upload all your data:

Next, we go back to our Experiment page to add the training definition. Apart from uploading the zip file including your python scripts, there are several things you need to configure:

Framework. Choose the one (tensorflow/pytorch/caffe) you code with.
Compute plan. Choose how many GPUs you want to use.
Hyperparameter optimization method. If you don’t want to tune hyperparameter, just select ‘none’. Otherwise, choose RBFOpt or random. Here I chose RBFOpt.
Number of optimizer steps. If you decide to tune hyperparameters, you have to clarify some parameters. This one equals the number of training runs.
Objective. Since I’m using keras to build a LSTM model, the objective has to exist in the model history. Note that it even requires the same name, otherwise Experiment Builder will post error messages. During my training process, I only calculated ‘train_loss’ and ‘val_loss’. To avoid overfitting, my objective is ‘val_loss’.
Maximize or minimize. Choose ‘minimize’ when the objective is loss and ‘maximize when it’s accuracy.
Add hyperparameters. These must be consistent with the hyperparameters we set in our script. This step will automatically generate a json file called ‘config.json’, which is why in our script we have to read hyperparameters from this file.

For each hyperparameter, you have to specify the range and data type.

And you’re finished! Don’t forget to give your experiment a unique name. Hit ‘create and run’. Deep learning training takes a while, but you can supervise your model while it’s running.