Hyperparameter Tuning with AI Platform

Imran us Salam
Red Buffer
Published in
10 min readAug 13, 2021

Hyperparameter tuning uses the Bayesian search and/or the grid search to find out the best set of hyperparameters. This is a service under the AI-Platform of GCP.

In this article, I will explain what hyperparameters are, what differences do they make when training a Machine Learning model, and how we can use AI-platform to get the best set of hyperparameters.

Warning: This article is math-friendly and will only cover the very basics.

Before diving straight into what Hyper Parameters are, I’ll give a brief overview of how a basic example of Machine Learning works for the ultimate beginners.

Machine Learning

Most of the Machine Learning algorithms, whether supervised or unsupervised work based on optimization theory.

But the question is. Optimize what exactly?

To explain that, let’s take an example.

Optimization

Imagine that you have some points in a graph whose coordinates are X and Y.

Now imagine we have a new point whose value at X-axis is 0.7. How do we know its value at Y-axis?

We can see that the plot is (almost) linear. So we can draw a line that passes through most of the points and the value at the Y-axis of that line at 0.7 on the X-axis will be the answer.

But how do we get a line like that? We can certainly draw ourselves. But for a machine to learn a line like that. We have to write a function.

Now we know that a line equation is Y = MX + c. We have the X values and the Y values for these 5 points. Let’s see how we can figure out “m” and “c”.

Parameters

In the above equation, “m” and “c” are our parameters. “m” is particularly used for finding the slope of a line and “c” is used to find the offset.

We have to find the best values of “m” and “c” that gives the best results

There are three bold letters in the above line. To find out what they mean. Let’s revisit optimization.

Optimization (Again)

By finding the best values of “m” and “c”, we have to define what best means? And how to find it?

There is a term loss, we pick out random “m” and “c” and plug in the values in our equation and find a loss term. A loss here is the difference between the points of actual vs with the new “m” and “c” line.

We can use this loss term to see how good or bad our parameters are.

So a loss value near 0 is good and away from it is terrible in this particular case.

So how do we go on to find the best set of “m” and “c”?

We use a technique called “Gradient Descent”. Where we after calculating a loss term, utilize it to update our parameters. In such a way that our parameters produce a better result now.

W = m(slope). b = c (offset). In equation 1, we plug in the values of m and b. In eq 2, We find a loss term using Squared distance. In eq 3, We take a partial derivative of equation 2 based on each parameter (m and c). In equation 4, we update our parameters using the calculated change(partial derivative) by adding as a negative-sum to the parameter after multiplying with alpha.

Iteration 1
Iteration 2
Iteration N

So when we do it a couple of times, iteratively. Our line starts to pass through the most points.

Let’s see how.

In the 3 pictures, we can see that the line starts off with random initialization of “m” and “c”. This gives a high value of loss term. And subsequently using the optimization algorithm. We were able to get a good line in N number of iterations.

Now we’ve been talking about Machine Learning, optimization and parameters. But we haven’t yet seen what a hyperparameter is and where is it placed?

Hyperparameters

See the equations again.

There is an “alpha” term in there. Which is called the learning rate parameter. This is a hyperparameter. Why is it called a hyperparameter?

It’s called a hyperparameter because, unlike a parameter, its values aren’t optimized and are chosen by the user of the algorithm. This makes a lot of difference. A machine learning algorithm cannot work without a proper set of hyperparameters. And the problem is we have to find it ourselves.

A typical set of hyperparameters include the learning rate parameter, momentum value, batch size, hidden layers, hidden units, etc.

We aren’t going to go in the detail of these hyperparameters. We are just going to see how we can use these parameters to our advantage.

Hyperparameter Tuning

Since hyperparameters are not optimized, they cannot be trained. Hence we have to find these parameters by ourselves. A simple approach would be to make a grid search of these hyperparameters.

An example is you have a batch size that could start from 1 to inf.

But we know that the batch size of more than 32 does not work almost ever (not a fact, my experimental statement).

So what do we do? We run the training 32 times, finding the best set of hyperparameters?

Yes and No. Training the algorithm 32 times does give you good results, but you never have infinite resources and it takes time to train. And what if you have two hyperparameters of which you want to find the best pair. If paired with the number of layers in the network, this can lead up to 32 x N(layers) experiments.

Hyperparameter Tuning with AI Platform

Hyperparameter tuning works by running multiple trials in a single training job. Each trial is a complete execution of your training application with values for your chosen hyperparameters, set within the limits you specify. The AI Platform Training training service keeps track of the results of each trial and makes adjustments for subsequent trials. When the job is finished, you can get a summary of all the trials along with the most effective configuration of values according to the criteria you specify.

Hyperparameter tuning requires explicit communication between the AI Platform Training training service and your training application. Your training application defines all the information that your model needs. You must define the hyperparameters (variables) that you want to adjust, and a target value for each hyperparameter.

To learn how AI Platform Training uses Bayesian optimization for hyperparameter tuning, read the blog post named Hyperparameter Tuning in Cloud Machine Learning Engine using Bayesian Optimization.

In addition to Bayesian optimization, AI Platform Training optimizes across hyperparameter tuning jobs. If you are doing hyperparameter tuning against similar models, changing only the objective function or adding a new input column, AI Platform Training is able to improve over time and make the hyperparameter tuning more efficient.

https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

Target Metric

A target metric is a metric we want to optimize. It could be either be a loss value in which case we would try to minimize this metric.

Or in the case of a classification problem, it could be an accuracy measure. Which we want to maximize.

Hyperparameters to Tune

We select the hyperparameters we want to tune and add them to our configuration file.

Types of Hyperparameters

There can be 4 types of hyperparameters we can use.

1. DOUBLE. This is a double type variable that can take a value range in minValue & maxValue format

2. INTEGER . This is an integer type variable that can take a value range in minValue & maxValue format

3. CATEGORICAL . This variable is provided as an unordered list. This can have string values you can add

4. DISCRETE . This variable should be an ordered list containing either integers or doubles

Types of Scaling

Hyperparameter tuning provides the option of scaling. This means that you can decide how DOUBLE or INTEGER values are going to change on every run.

  • UNIT_LINEAR_SCALE
  • UNIT_LOG_SCALE
  • UNIT_REVERSE_LOG_SCALE

Search Algorithm

There are three types of search algorithms we can select. This includes

1. Bayesian optimization

2. Grid Search

3. Random Search

Implementation

We’ve read enough. Let’s implement a small Machine Learning pipeline on which we can tune our hyperparameters.

TensorFlow 2.x with AI Platform Hyperparameter Tuning

The problem we are going to solve is a conventional Linear Regression problem with Boston Housing Dataset.

In your main train.py, add the necessary

import argparse
import hypertune
import tensorflow as tf

We are going to use “argparse” library to draw in the hyperparameter values from the AI platform Search Algorithm.

Hypertune library is a GCP utility library to get metric value from code and use it as an output in the AI platform.

dataset = tf.keras.datasets.boston_housing
(x_train, y_train), (x_val, y_val) = dataset.load_data()

We load the dataset which is already present in the keras.datasets.

And then split the dataset into training and validation.

class LinearRegression(tf.keras.Model): # Subclass from tf.keras.model  def __init__(self): # Define All your Variables Here. And other    configurations    super(LinearRegression, self).__init__()
self.dense = tf.keras.layers.Dense(1)
def call(self, x):
return self.dense(x)

Here we make a Linear Regression model using TensorFlow2 Keras module.

parser = argparse.ArgumentParser(description=’Input parameters need to be Specified for hypertuning’)
parser.add_argument(‘ — epochs’, default=10, type=int, help=’Number of Epochs Specified’)
parser.add_argument(‘ — lr’, default=0.003, type=float, help=’Learning rate parameter’)
args = parser.parse_args()
epochs = args.epochs
lr = args.lr
model = LinearRegression()
adam = tf.keras.optimizers.Adam(learning_rate=lr)
model.compile(loss=’mse’, optimizer=adam)
model.fit(x_train, y_train, epochs=epochs, verbose=0)
loss = model.evaluate(x_val, y_val) / x_val.shape[0]
print(loss)
hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(hyperparameter_metric_tag=’loss’, metric_value=loss, global_step=epochs)

This is the main training pipeline.

The hyperparameters we want to fine-tune must be mentioned in the argument parser as an argument.

We initialize the model. Add an optimizer algorithm with its required set of input parameters.

We define the loss function we want to use. And then we train the model.

Once the model is trained, we can evaluate and get the “loss” value over the validation set.

Then we initialize the Hypertune object and tell that our metric value is Loss.

And that’s the main file.

https://github.com/imransalam/gcp-ai-platform-hyperparameter-tuning-tf2/blob/master/train.py

Let’s look at our configuration file now

trainingInput:
hyperparameters:
goal: MINIMIZE
maxTrials: 50
maxParallelTrials: 5
hyperparameterMetricTag: loss
enableTrialEarlyStopping: FALSE
params:
- parameterName: epochs
type: INTEGER
minValue: 100
maxValue: 10000
scaleType: UNIT_LINEAR_SCALE
- parameterName: lr
type: DOUBLE
minValue: 0.001
maxValue: 0.009
scaleType: UNIT_LINEAR_SCALE

We define that our goal is to minimize the metric which we set as loss .

The maximum number of Trials is 50. You can change it to your liking but it shouldn’t be more than the search space.

maxParallelTrials refers to the parallel runs. Note in Bayesian optimization, it’s a good idea to keep this number low.

Under the params , we define which parameters we are going to use and their properties.

The parameter epochs have the datatype INTEGER and can take values from 100 to 10000. (Note epochs shouldn’t be a hyperparameter)

The parameter LR has the datatype DOUBLE and can take values from 0.001 to 0.009.

https://github.com/imransalam/gcp-ai-platform-hyperparameter-tuning-tf2/blob/master/hptuning_config.yaml

We’re done for the most part. Let’s now add a requirements.txt and a Dockerfile

tensorflow==2.1.0
cloudml-hypertune==0.1.0.dev6
https://github.com/imransalam/gcp-ai-platform-hyperparameter-tuning-tf2/blob/master/requirements.txt
FROM python:3.7.6-slim
WORKDIR /
RUN apt-get -y update \
&& pip install — upgrade pip
COPY . /
RUN pip install -r /requirements.txt
ENTRYPOINT [“python”, “train.py”]

https://github.com/imransalam/gcp-ai-platform-hyperparameter-tuning-tf2/blob/master/Dockerfile

Now that we’re done with coding it. Let’s build the docker image and push it to the container registry and submit the job

But first, you have to do gcloud init in your system to set your set it to the working project under GCP. And go to AI platform from GCP to enable the API. So that we can submit the job.

Once we’re done, let’s build

export PROJECT_ID=$(gcloud config list project — format “value(core.project)”)export IMAGE_REPO_NAME=gcp_ai_platform_hyperparameter_tuning_tf2export IMAGE_TAG=gcp_ai_platform_hyperparameter_tuning_tf2_imageexport IMAGE_URI=us.gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG

This will initialize the image_tag and uri which we are going to push to the GCP container registry.

git clone https://github.com/imransalam/gcp-ai-platform-hyperparameter-tuning-tf2.gitcd gcp-ai-platform-hyperparameter-tuning-tf2

Then build the docker image

docker build -f Dockerfile -t $IMAGE_URI ./

Once it’s built, push it to the container registry.

docker push $IMAGE_URI

Now we’re ready to submit our Hyperparameter Tuning Job. Just initialize some variables, like the region you want to use and the job name and submit it.

export REGION=us-central1export JOB_NAME=gcp_ai_platform_hyperparameter_tuning_tf2_$(date +%Y%m%d_%H%M%S)gcloud ai-platform jobs submit training $JOB_NAME — scale-tier BASIC — region $REGION — master-image-uri $IMAGE_URI — config hptuning_config.yaml

You can view your jobs in GCP/AI-Platform/jobs. It looks something like this.

And that’s it, folks :)

Thank you for reading.

Here is the GitHub link

--

--