Parallel Hyperparameter Tuning with Optuna and Kubeflow Pipelines

Masaki Kozuki
Optuna
Published in
14 min readNov 6, 2020

This entry is a translation of the Japanese-language blog post originally authored by Mr. Masao Tsukiyama of Mobility Technologies Co., Ltd. The Optuna community would like to thank Mr. Tsukiyama for permitting us to post this translation.
Disclaimer: All the slides and videos are in Japanese language.

Introduction

Hi there. This is Masao Tsukiyama of ML engineering group 1 of AI technology development department of Mobility Technologies (MoT).

The other day, I tuned hyperparameters in parallel with Optuna and Kubeflow Pipeline (KFP) and epitomized it into a slide for an internal seminar and published the slides, which got several responses.

The slide was noticed by an Optuna maintainer, and they asked me to write a blog about it. There are some duplicated contents with the above slides, so let me walk you through the below items with some figures and snippets

  • Optuna and KFP use case
  • introduction and tutorial of Optuna and KFP
  • Optuna’s usability and contribution to our project
  • tips for integration into production.

Use Case Definition

When it comes to hyperparameter tuning, you may think of the hyperparameters of machine learning and deep learning, such as the number of layers of your deep neural network.

However, we wanted to optimize the parameters of reinforcement learning models which propose the optimal route for one of our products called Passenger Search Navigation.

What is “Passenger Search Navigation”?

We, at MoT, provide a machine learning based service named Passenger Search Navigation (Japanese name is “お客様探索ナビ”) with taxi drivers using taxi distribution service “GO”. This service helps the drivers find riders when they are not waiting for ones at stations, aiming at enabling new hires at taxi companies and drivers who are not familiar with the area to make more money than the average.

How Machine Learning Plays a Role?

This section explains how Passenger Search Navigation recommends routes to drivers.

To suggest a route to a driver, we implement the following steps

  1. Create machine learning models using stats, such as the number of rides, and predict the number of ride requests in the next 30 minutes on each road (demand).
  2. Create other machine learning models that take the same stats and predict the number of taxis that will be on the road in 30 minutes on each road (supply).
  3. Run inference of the both models every 15 minutes to continually update the predicted demand and supply.
  4. Recommend a route with the predicted demand and supply with reinforcement learning.

Reference

Should you be interested in the detailed algorithm, there are some materials in Japanese.

Also, I would highly recommend the below materials for those who are interested in the whole architecture and MLOps of our service.

Hyperparameters of Value Iterator

In the above reinforcement learning, there exists a component called Value Iterator. As you know, this Value Iterator has some amount of hyperparameters and these hyperparameters have some effects on the recommended route as well as the predicted demand and supply.

Furthermore, the predicted route affects the profit in the simulation and the profit of each driver. This is the reason we want to tune these hyperparameters of Value Iterator.

To be exact, the situation was like this:

  • Value Iterator component has a bunch of hyperparameters, however, we have not tuned them from Proof of Concept to the present.
  • Due to the nature of our Passenger Search Navigation, we need different models for different areas and the optimal hyperparameters would be different.
  • Also, optimal hyperparameters would be different in different timeframes.

Given the above, we decided the following tuning requirements:

  • The entire tuning process must be automated
  • Tuning can be executed with ease regularly or as necessary
  • It’s not too time-consuming
  • It’s not too compute-hungry; we have a limit on the server cost

While there are various frameworks for hyperparameter tuning, we chose Optuna as it is popular and it looked easy to use with its intuitive interface.

Simulation and Evaluation of Machine Learning Models

As it’s related to why we adopted Kubeflow Pipeline (KFP) for parallel hyperparameter tuning, let me illustrate the simulation and evaluation of machine learning models in Passenger Search Navigation.

In Passenger Search Navigation, two machine learning models predict the demand and supply, and then the reinforcement learning model suggests a route from that demand and supply. To evaluate the suggested route, we have a simulator to see how much profit it will make.

We evaluate the route with how many rides occurred and when and where rides happened from the records of actual demand and supply.

The criterion of updating the machine learning models included minimizing the squared error and maximizing the simulated profit.

So, our tuning task would run the below repeatedly

  1. Update the hyperparameters of Value Iterator component
  2. See the profit by running the simulator with the updated hyperparameters

The duration of simulation is one week (seven days), though we have already composed a KFP pipeline to collect the data of seven days in parallel and run the simulator.

We chose KFP because it’s simple to automate the tuning and we can utilize this existing pipeline.

Introduction and Tutorial of Optuna and KFP

Let me briefly summarize Optuna and KFP for those who are not familiar with them.

What is Optuna?

Optuna is an open source hyperparameter optimization framework to automate hyperparameter search. It’s released in December 2018 and its stable version came out in January 2020. It’s implemented in Python, like other machine learning frameworks.

Features unique to Optuna listed up in the official page are

  • Parallelize hyperparameter searches over multiple threads or processes without modifying code
  • Automated search for optimal hyperparameters using Python conditionals, loops, and syntax
  • Efficiently search large spaces and prune unpromising trials for faster results

In general, it’s designed to make it easy to implement distributed and parallelized tuning.

Tutorial of Optuna

The below is copied from the official tutorial that you can download in a Python script or Jupyter Notebook.

In Optuna, the whole tuning process is called Study and each evaluation of one set of hyperparameters is called Trial.

You define the process of the evaluation of each trial from sampling hyperparameters to return of the evaluated value. In the above snippet,

  • create_study instantiates a new Study specifying direction=”minimize” / direction=”maximize” as the objective function is to be minimized/maximized.
  • study.optimize executes the tuning. The number of trials is 100.
  • The objective function samples the hyperparameter of x and evaluates the quadratic function.

A Study object provides the following useful methods and attributes.

study.best_params
>> {'x': 1.9926578647650126}
study.best_trial
>> FrozenTrial(number=26, state=<TrialState.COMPLETE: 1>, params={‘x’: 1.9926578647650126}, value=5.390694980884334e-05, datetime_start=xx, datetime_complete=xx, trial_id=26)
study.trials
>> [FrozenTrial(number=0, …), …]

The example we’ve looked at uses only one floating point values in linear space with suggest_float method, Trial provides the following methods for hyperparameter suggestion:

# Categorical
optimizer = trial.suggest_categorical(“optimizer”, [“MomentumSGD”, “Adam”])
# Integer
num_layers = trial.suggest_int(“num_layers”, 1, 3)
# Floating point values in linear space
dropout_rate = trial.suggest_float(“dropout_rate”, 0.0, 1.0)
# Floating point values in logarithmic space
learning_rate = trial.suggest_float(“learning_rate”, 1e-5, 1e-2, log=True)
# Floating point values in discrete linear space
drop_path_rate = trial.suggest_float(“drop_path_rate”, 0.0, 1.0, step=0.1)

Other than Study and Trial, Optuna has the concept of Storage. As the name implies, Storage tracks the history of Study and its Trials, and there are several types available depending on the use case.

InMemoryStorage:

  • Default storage class
  • Claims Storage on the memory where tuning is running
  • Basically not tracking Trials for long
  • Faster than RDBStorage if you parallelize tuning in one unique instance

RDBStorage:

  • Claims storage in external RDB
  • MySQL, PostgreSQL, and SQLite is available
  • Best for distributed optimization as all Study and its Trials are recorded
  • Allows for stop and resume of Study

In Optuna, you can specify the number of jobs with the argument of n_jobs to Study.optimize. We have only one Optuna job, but execute Trials in parallel with KFP and n_jobs, both options are feasible. However it turns out to be helpful to be able to reference the history of Study and Trials, and as sometimes we want to increase the number of trials, we set up MySQL server on GCP Cloud SQL and use RDBStorage.

What is Kubeflow Pipeline (KFP)?

Kubeflow including KFP is a framework developed by Google, and it provides enough tools to implement a whole cycle of machine learning projects on Kubernetes. As a side note, we did not use any other Kubeflow components other than KFP.

KFP is a workflow engine oriented to machine learning and is getting more and more popular these days, but we rarely see the use case of the other components.

There are other famous workflow engines like Apache Airflow and DigDag, though, KFP has some strong points as follows:

  • A feature named “Experiment” allows for preparing a pipeline for each experiment and you can change input parameters and execute from its Web UI.
  • Visualization of inputs and outputs of every single pipeline task, enabling us to check the artifacts such as Jupyter Notebook and Confusion Matrix on the Web UI.
  • Easy comparison of experiments and their results, leading to easier comparison of parameters.

KFP is built on top of Argo, an OSS workflow engine for machine learning, but KFP is more friendly as we can define pipelines with Python. In Passenger Search Navigation, we use KFP for R&D things, e.g., simulation and experiments and Apache Airflow for operation things, e.g., machine learning models’ deployment pipeline and their inference pipelines.

Sometimes it’s better to use different tools for different phases.

Parallel Hyperparameter Tuning Flow with Optuna and KFP

So, here comes the result: the implementation and hyperparameter tuning of Passenger Search Navigation.

Let’s consider the implementation policy upon the above Optuna snippet.

As it’s preliminary, tentatively set n_trials (the number of trials) as 100.

As the default optimization algorithm of Optuna is TPE and it’s sequential, too big n_jobs might harm the performance of TPE, therefore, n_jobs is 5 and 5 turned out not to be harmful.

So what we need to do in objective is collect the simulated profit after running the simulation using the suggested hyperparameters for Value Iterator component.

We’ve already implemented the simulation process as a KFP pipeline job and used the job for experiments and evaluation in the deployment pipeline.

The below figure illustrates how this flow is organized.

Of course, you can run Optuna locally, but it’s tedious to wait for the tuning to terminate after a couple of hours, so I implemented a KFP job to execute Optuna.

At first, a deployed Optuna job calls create_study to start a new tuning which is tied to MySQL server on Cloud SQL.

As n_jobs is 5, study.optimize evaluates 5 trials in parallel. Each running trial runs a simulator job pipeline to KFP with the suggested hyperparameters for Value Iterator component before getting the simulated profit.

Each thread is waiting for its simulation to end. The above figure looks like the threads are synchronized though, actually each thread runs Trial independently.

After the Trial finishes, store results to the storage and run the next trials. In the new trials, hyperparameters are suggested using the past records and these steps are repeated until the simulated profit gets converged.

Thanks to RDBStorage, if n_trials is not enough, we can resume this tuning with optuna.load_study.

Codebase

So far we’ve looked at how our fully automated tuning flow is implemented figuratively. Now I will show you the actual codebase.

Deploy Optuna Job

Let’s focus on the thing. The deployment of Optuna Job pipeline is implemented in wf.create_optuna_pipeline which will be explained soon. This method compiles the created pipeline and deploys it with wf.run_pipeline() to the KFP cluster.

KFP Pipeline Function of Optuna Job

This is the implementation of wf.create_optuna_pipeline(). In KFP, we can implement pipeline functions with @kfp.dsl.pipeline decorator. By passing a Slack notification Operator to dsl.ExitHandler, we will be notified when the job terminates, whether it succeeds or not.

KFP Operator of Optuna Job

This Operator runs Optuna task is implemented as follows:

create_optuna_op creates a Container Operator that runs Optuna Job. There’s one task per container, so we deploy the task to the KFP cluster specifying any Docker image. Note that we can specify Sidecar Container as Container Operator.

As our Optuna job uses RDBStorage (MySQL server on CloudSQL to be exact) GCP’s official container gcr.io/cloudsql-docker/gce-proxy is used. The process of tuning is encapsulated into the Image passed to this Operator.

Tuning Execution with Optuna Job

Containers initiated by the Operator first call this method according to the storage argument.

We pass settings[“study_storage”] but it’s a sting of the format of mysql+pymysql://{user}:{password}@localhost/{cloudsql_datasetname} (remember, we use MySQL). You can also have this connected to MySQL Server via CloudSQL Proxy of Sidecar.

load_if_exists argument enables you to resume the existing Study of the same ID on the Storage if `True`.

The Objective Function

Recap of our objective function

  1. Sample hyperparameters for Value Iterator component
  2. Run simulation with the hyperparameters sampled above
  3. Get and return the result (= profit) from the simulation

As we have multiple hyperparameters for different Distributions of the Value Iterator component, each of them are suggested by using getattr to specify the distribution and the domain after writing down distributions and search space for each hyperparameter to a config file.

Next, it checks if the number of correctly completed trials from Study is smaller than the specified number of trials. The waiting for simulation sometimes fails, so there is a workaround in case len(trials) doesn’t tell us the exact number of completed trials.

This kind of handling can be implemented with a callback function for Study.optimize. In the callback which takes Study and Trial as its inputs, check the number of completed trials and compare it with the maximum number of trials in order to decide whether to stop (Study.stop) or not.

run is a function that generates a simulation pipeline (Operators) and deploys it to the KFP cluster as already explained.

Finally, different from the deployment of Optuna Job, we need to wait for the simulation termination after run_pipeline before collecting, summarizing, and returning the simulated profit.

Evaluation of Hyperparameter Tuning

If the simulated profit converges, move on to the comparison of hyperparameters.

To compare the performance of the hyperparameters previously used and the hyperparameters tuned:

  • Tune the hyperparameters for the fixed two machine learning models with the data of 2019/10/01–2019/10/7
  • To see that the tuned hyperparameters are not overfitting, evaluate the two sets of hyperparameters on the data of 2019/10/08–2019/02/29 by simulation
  • As to the demand and supply in the simulation, we use both the predicted values and statistical values. This can be the case for our production environment.

From this comparison, we confirmed that the profit is raised by 1.4% and 2.2% by using the predicted values and statistical values, respectively.

Then we release the new set of hyperparameters to the production after having the quality assurance team confirm the proposed routes are valid.

So, in short, we can gain the profit increase using fully automated parallel tuning with Optuna better than or equal to the profit increase with the policy by our algorithm team. Note that the latter requires work and is not automated.

Usability of Optuna

As said previously, the tuning with Optuna brought more than the 2% gain to our drivers. The hyperparameters that are achieved are already live and suggesting routes to our drivers.

We have not been able to compare the profit of our drivers before and after the release, thus we are a bit sorry not to be able to say “our automated tuning gives our drivers joy”. A real world comparison is difficult as the profit and routes depend on season, timeframe, and luck.

So here are some thoughts of using Optuna in our product:

  1. Interface is intuitive and easy to parallelize and distribute
  2. Easy to keep implementation simple because we can handle the objective function as a black box of taking hyperparameters as inputs and returns some value.
    Thanks to this design, we were able to use Optuna without modifying our codebase of simulation and its execution as we just encapsulate Optuna specific logic into new functions.
  3. SDK is useful
    Most information we want can be accessed from Study and Trial, which make it easier to do some detailed processing in the objective function and post-analysis of tuning
  4. Integrating Optuna into existing projects is not hard because of 1, 2, 3.
  5. Customizability though we haven’t tried

As the simulation is implemented as a KFP pipeline, our codebase deploys KFP jobs in a nested manner, making it a bit complicated.

We were looking for a better design initially, but chose this approach because we could understand the relationship between tuning and simulation.

In short, the design is easy to understand and we could move to implementing our logic quickly.

Tips for Optuna in Production

We had enough benefit from integrating parallel hyperparameter tuning into our product because

  • we would create multiple models and each model has different optimal hyperparameters.
  • timeframes cause models to use different hyperparameters for better performance.

As noted beforehand, since Optuna enabled us to handle the logistics that we want to tune as a block box, keeping existing smaller codebase is not difficult.

As our problem is not so common and we had enough resources, we spent the time to automate the tuning process. However, even if your problem is simple like wanting to run tuning on a few instances for experiments, I would say that you will enjoy the merit of Optuna.

So, whatever hyperparameters you are tuning, the general workflow will be as follows:

  • you have some inputs like hyperparameters and some outputs that you want to optimize
  • If the number of hyperparameters are astronomic, starting from small portion of them would be good
  • Confirm that it is allowed to tune your hyperparameters.
    Sometimes they include some that should not be tuned or tuning is meaningless.
  • Implement the objective function.
    Keeping the relationship of this function and the experiment logic sparse is great.
    e.g. Encapsulating it into KFP’s Experiments Pipeline
    The only requirements are that the function takes some inputs and outputs something.
  • Choose the number of trials and run tuning
    You can resume the study with RDBStorage, you can increment the number of trials like 100 -> 150 -> 200
  • Run experiments to check the performance the optimal hyperparameters Optuna finds
    In our case, we ran simulations for the period that are not the target of tuning to see the gain.
    For the case of machine learning model training, do tuning on validation accuracy or loss and see the test accuracy lastly.
  • Finally, decide whether to use that hyperparameters

Conclusion

We integrated Optuna for the first time while we’ve used KFP for simulations.

To repeat, Optuna is good for its intuitive interface, good documentation, and its design that allows for simple implementation.

Since this problem would enjoy the benefit of re-tuning after some period, we automated the comparison experiment by scheduling a regular tuning pipeline.

Hope you get some takeaways by this post, thank you.

--

--

Masaki Kozuki
Optuna
Writer for

graduate student / comp. sci. / machine learning / deep learning / Chainer / PyTorch / Optuna