Practical bayesian optimization using Goptuna

Masashi SHIBATA
Jul 26 · 5 min read

Bayesian optimization is widely used to find the maximum or minimum value of black-box function. Although it has been mainly studied for hyper-parameter tuning of machine learning models, it is also able to apply to any problems if you can design the objective function.

For example, Facebook announced the plan to apply bayesian optimization in wide variety of projects using Ax and Botorch(Please watch the talk at F8 2019 for more details, Product Optimization with Adaptive Experimentation).

  • Tuning Video Streaming Algorithms
  • Tuning Machine Learning Models
  • AR/VR Hardware Design
  • Tuning Just-In-Time compiler (for HHVM)

In this article, I investigates the existing bayesian optimization library written in Go, then introduces Goptuna which I published yesterday. It can be used for optimizing the number of goroutines of our servers written in Go and optimizing the memory buffer size of our caching systems.

Bayesian optimization library for black-box functions written in Go.
https://github.com/c-bata/goptuna

Introduction of Bayesian Optimization

First, I will explain what Bayesian optimization is doing. When the performance of the system changes with parameters that is a continuous value in the range of 0 to 10. I would like to select the parameter that gives the best performance of the system. The two basic ideas are following:

  • Search for parameters that improve performance while changing parameters at fixed intervals like 0, 1, 2, …, 10 (called Grid Search)
  • Give a random value between 0 and 10 as a parameter and search for a parameter that improves performance (called Random Search)

These methods are very simple, but it takes a lot of evaluations to get to the best parameters.

In the case of human beings, it may be somewhat imaginable which parameter to evaluate next based from the past results. If there are parameters that has already obtained a good evaluation value, the parameters near that are likely that a good evaluation value will be obtained. And if there is a parameter that has not been evaluated at all, you may still get a good evaluation value if you explore it. This kind of human-like exploration can be achieved by Bayesian optimization.

There are 3 state-of-the-art bayesian optimization methods:

GP (Gaussian Process) based

TPE (Tree of Parzen Estimators)

SMAC (Sequential Model-based Algorithm Configuration)

Investigating existing Bayesian Optimization library for Go

I searched existing libraries to use Bayesian optimization in the Go, and found a Gaussian process based Bayesian optimization library called go-bayesopt. The latest gonum had hit an API that doesn’t exist, but it worked with a little fix (it’s already merged into the upstream).

I prepared the minimization problem like following.

f(x1, x2) = (x1–1)²+ (x2–2)²

The optimal parameter of above function is (x1, x2) = (1, 2). I prepared the following program using go-bayesopt to optimize the function.

After 10 selection of parameters by Random Search, 90 selection of parameters using GP as a surrogate model and UCB as an acquisition function is performed. The execution results are as follows.

$ go run _examples/main.go
...
2019/07/25 15:23:23 x: map[bayesopt.Param]float64{bayesopt.UniformParam{Name:"X1", Max:10, Min:-10}:1.0221672603083967, bayesopt.UniformParam{Name:"X2", Max:10, Min:-10}:1.8540991095989217}
2019/07/25 15:23:23 y: 0.021778

The objective function is called 100 times, then I got the f (1.022, 1.854) = 0.021778 as the best evaluation value. Looking at the following graph, you can see that go-bayesopt focuses on the points that are likely to be high evaluation value.

Values of X1 searched by go-bayesopt.
Values of x2 searched by go-bayesopt

It works fine. But it is probably implemented for study use, so there are not enough functionalities. If you want to use another acquisition function such as EI (Expected Improvement), you will need to implement it. And it takes 5 minutes for 100 evaluations because inferences from GP regression models requires the computation of inverse matrices, and generally the computation time tends to be long. It require O(t³), t is iteration count.

The Python library uses LAPACK through numpy and scipy, but in Go we want to avoid to use cgo for keeping portability. It is nice to be able to calculate efficiently in Pure Go from the viewpoint of black box optimization is able to apply wide variety projects.

TPE bayesian optimization using Goptuna

Pure Go seems to be able to be implemented fast enough if it uses TPE(Tree-structured Parzen Estimator). Since there is no existing library, I implemented it and published Goptuna yesterday.

As the name implies, it is a golang port of Optuna. The design of the objective function also adopted the Define-by-Run interface adopted by Optuna. Although there are some changes to the Go-like design, such as error handling and how to specify Option, it can be used intuitively as follows.

See the paper of Optuna at KDD 2019 for more details about Define-by-Run interface. Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019.

The results are below:

Random Search with 100 evaluations

$ go run _examples/simple_random_search/main.go
...
Result:
- best value 2.5904889254208054
- best param map[x1:0.4405659427118902 x2:0.7883530201915825]

TPE with 100 evaluations (including 10 evaluations of random search)

$ go run _examples/simple_tpe/main.go
...
Result:
- best value 0.6195459685895217
- best param map[x1:0.7770961438621743 x2:1.2451093857329996]

In this case the result of TPE is better than Random Search and worse than go-bayesopt. The search spaces of objective function has a low dimensions (2-dimentions) and it doesn’t include categorical variables. Eggensperger et al. confirmed that Gaussian process based Bayesian optimization performs better than TPE and SMAC for low-dimensional functions (See Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters.).

Next, let’s compare the execution time. Since the objective function is a simple one that consumes little time, the time taken for this execution is dominated by the time consumed by the black box optimization library.

Goptuna (TPE) works extremely fast!

Although GP requires inverse calculation time required inversely as the past selection point increases, TPE works very fast. I think that if it takes too long to select parameters when doing parameter tuning of server middleware implemented by Go, I think that it is difficult to use, but I think that there are few cases where Goptuna’s TPE is fast enough.

Conclusion

This time, while examining Go’s Bayesian optimization library go-bayesopt, I compared its performance with Goptuna prepared this time. Goptuna’s TPE performs Bayesian optimization very fast, and is better than go-bayesopt which uses GP if the parameter has a large number of dimensions or contains qualitative variables (categorical variables) You may be able to find. I hope everyone will use Goptuna and feedback me!

Masashi SHIBATA

Written by

Creator of go-prompt and kube-prompt. github: c-bata

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade