Hyperparameter Optimization with IBM’s Deep Learning Service: A Tutorial

Lana El Sanyoura
7 min readAug 31, 2018

--

A step-by-step tutorial on performing Hyperparameter Optimization using IBM’s Deep Learning service in Watson Studio on an MLP trained on the MNIST dataset.

Outline:

  1. What is Hyperparameter Optimization (HPO)?
  2. What is IBM’s Deep Learning service in Watson Studio?
  3. Tutorial Pre-requisites
  4. Step-by-Step Walk-through on a 3 Layer MLP trained on MNIST
  5. Appendix

1. What is Hyperparameter Optimization (HPO)?

In machine learning, Hyperparameter Optimization is the process of searching for and choosing an optimal set of architecture-defining parameters for your machine learning model.

For example, if we wanted to minimize our model’s test error, we could perform HPO on variables such as the model’s learning rate, or reguralizer coefficients, to find the set of these variables that gives us optimal model performance.

2.What is IBM’s Deep Learning service in Watson Studio?

IBM’s Deep Learning service in Watson Studio is a platform for Deep Learning training. It allows the use of popular frameworks like Tensorflow, Caffe and PyTorch to train Neural Network models with on-demand GPU compute instances.

If it’s your first time using this service, you can get set up by following a step-by-step tutorial

Additional Resources:

FAQ

Behind IBMs’ DL Service in Watson Studio

3. Prerequisites

i. Getting set up on DLaaS

ii. Knowing these terms and definitions:

  1. IBM COS Bucket
  2. Model Definition (.zip file)
  3. Training-definition
  4. Training-run
  5. Experiment
  6. Hyperparameter Optimization Experiment
  7. Experiment-run

4. Step-by-step Tutorial for an MLP trained on MNIST

In this tutorial we will be working with a multi-layer perceptron that is trained on MNIST.

The code for the model we are training can be found in the appendix.

We will be using random search to find the optimal weight_decay coefficient, lam , that minimizes our test error, error_val.

  1. Download the MNIST dataset
  2. Set Up Environment Variables *
$ export ML_INSTANCE= <instance_id>
$ export ML_USERNAME= <username>
$ export ML_PASSWORD= <password>
$ export ML_ENV= <envurl>

*If you did not store the values of the variables from the one-time setup, you can retrieve them with:

$ instance_id=`bx service key-show <CLI_WMLi> <key_CLI_WMLi> | grep "instance_id"| awk -F": " '{print $2}'| cut -d'"' -f2`
$ username=`bx service key-show <CLI_WMLi> <key_CLI_WMLi> | grep "username"| awk -F": " '{print $2}'| cut -d'"' -f2`
$ password=`bx service key-show <CLI_WMLi> <key_CLI_WMLi> | grep "password"| awk -F": " '{print $2}'| cut -d'"' -f2`
$ envurl=`bx service key-show <CLI_WMLi> <key_CLI_WMLi> | grep "url"| awk -F": " '{print $2}'| cut -d'"' -f2`

$ echo ""; echo "ML Instance Credentials:"; echo "instance_id: $instance_id"; echo "username: $username "; echo "password: $password"; echo "" echo "envurl: $envurl"; echo ""

And set them with:

$ export ML_INSTANCE=$instance_id
$ export ML_USERNAME=$username
$ export ML_PASSWORD=$password
$ export ML_ENV=$envurl

3. Create an alias to simplify the invocation of the aws command:

Mac OS users:

$ alias bxaws='aws --profile <my_aws_profile> --endpoint-url=http://s3-api.us-geo.objectstorage.softlayer.net'

Windows OS users:

doskey bxaws=aws --profile <my_aws_profile> --endpoint-url=http://s3-api.us-geo.objectstorage.softlayer.net $*

3. Upload the dataset to the COS Bucket of choice

We will be training our model on MNIST and need to upload this dataset to our bucket.

Through GUI:

OR Command Line :

$ bxaws s3 cp mnist_all.mat s3://samplebucketone/mnist_all.mat

2. Create a training-definition manifest file

We need to specify our program’s requirements through the our training-definitions manifest file:

We’re using Python 3.5, tensorflow 1.5, and executing the training by calling:python3 mnist_classifier.py

A sample template, training-definitions.yml, can be created with:

$ bx ml generate-manifest training-definitions

Or you can copy and update the file below:

3. Create an experiment manifest file with HPO requirements

Based on our code in mnist_classifier.py, we want to use a random search algorithm (16) to minimize (21) our test error, error_val (18) . We want to perform HPO on lam (the coefficient for the weight_decay) with the values 0.0001 & 0.001 (25).

A sample template, experiments.yml, can be generated with

$ bx ml generate-manifest experiments

Or you can copy and update the following file:

Now that we have created training-definitions.yml and experiments.yml, we need to update our code so it runs on IBM’s Deep Learning service in Watson Studio.

4. Update your code to use HPO and IBM’s Deep Learning service in Watson Studio

i. Update the path to the dataset in your codebase such that it’s from DATA_DIR (i.e. your COS bucket)

ii. Read the chosen parameters from config.json , and set the hyperparameters respectively.

Since we are trying to tune the weight_decay coefficient lam, we need to load its values from the file config.json (which is generated when we run our experiment).

iii. It is required that you write your results to RESULT_DIR/val_dict_list.json :

The HPO algorithm depends on the results of previous hyperparameter settings to infer what values the hyperparameters should be set to next.

After you write a training-run’s results to RESULT_DIR/val_dict_list.json, the HPO algorithm reads this file and chooses the next best value for lam . So we need to write our results to this file.

An example of what RESULT_DIR/val_dict_list.json looks like when lam=0.0001 :

[{“error_val”: 79.79, “steps”: 1}, {“error_val”: 5.720000000000002, “steps”: 2}, {“error_val”: 3.190000000000004, “steps”: 3}, …

… , {“error_val”: 1.5000000000000013, “steps”: 158}, {“error_val”: 1.4700000000000046, “steps”: 159}, {“error_val”: 1.4700000000000046, “steps”: 160}]

We specified our objective’s string_value as error_val in experiments.yml :

So we need to include the test error at each step (i.e. epoch) as dictionary keys in val_dict_list.json.

Here is a snippet of how to write the metrics to RESULT_DIR/val_dict_list.json

5. Zip your required files for training

Our command calls mnist_classifier.py so we need to add this file to our model definition zip file.

Now our code is ready to be sent out for training! Next up, we need to create some training-runs and experiments-runs!

6. Store the training-definition to create the training-definition ID

You will need the training-definition ID for the experiments.yml manifest file. This ID will allow the experiment to obtain the instructions for launching individual training-runs.

$ bx ml store training-definitions <path-to-model-zip> <path-to-training-manifest

7. Update the experiments.yml file, appending “training_definition_url” with the generated training-definition ID (line 10)

Now that we have our training-definition ID, we need to include it in our experiments.yml manifest file (10) .

file: experiments.yml

In this example, the uniquely generated training-definition ID = 664dd41d-654d-4430-beb8–9bfa993bddd4

So, we must update line 10 in the experiments.yml file:

8. Store the experiment to create the experiment ID:

We’ve created the training-definition IDs, let’s create the experiment ID.

$ bx ml store experiments <path-to-experiments-manifest-yaml>

9. Run the experiment to create the experiment-run ID:

Now that we have the instructions on how to run our experiments and train our models, we need to call the run command, and start optimizing.

$ bx ml experiments run <experiment-ID>

And there you have it!

You are now launching several training-runs for your model and performing Hyperparameter Optimization using DLaaS!

You can list all the training-runs under this experiment-run and access the results, submitted code, and log files!

$ bx ml list training-runs <experiment-ID> <experiment-run-ID>

Now, you can explore the RESULT_DIR folder in the bucket related to this experiment-run

In our case, the training-run ID is training-Zlo4wHhiR

You can view your individual run results by downloading RESULT_DIR/<SUB_ID_INDEX>/val_dict_list.json, or view the final results after the optimization by looking at RESULT_DIR/results.json

Through GUI

OR Command line:

$ bxaws s3 cp s3://<bucketname>/<training-run-ID>/<SUB_ID_INDEX>/val_dict_list.json

And that’s it! You can check out the IBM’s Deep Learning service in Watson Studio HPO Docs for more information!

5. Appendix

Code for the model on MNIST

--

--

Lana El Sanyoura

Computer Science & Cognitive Science at the University of Toronto, 4th year. Researching language acquisition thru NLP , interned @ Intel, MIT-IBM Watson AI Lab