Introduction to Kurobako: A Benchmark Tool for Hyperparameter Optimization Algorithms
Kurobako is a command-line benchmark tool for hyperparameter optimization algorithms. It is framework agnostic but mainly used in the development process of Optuna, a hyperparameter optimization framework, to verify the optimization performance.
In this story, I will briefly introduce this benchmark tool and describe why this is needed by Optuna.
Optuna and blackbox optimization
The sampler decides which hyperparameter to optimize on the next trial aiming to maximize or minimize the objective function, which the user defines (Figure 1). It doesn’t require any special characteristic of the objective function (such as differentiability) and runs only based on past hyperparameter evaluations. This kind of optimization algorithm is called blackbox optimization.
On the other hand, the pruner will decide whether to discontinue the evaluation of the hyperparameter. In deep learning, suppose we want to optimize the hyperparameter to train the model, it often takes a long time for each evaluation of the hyperparameter. By using the pruner, we can shorten the time to optimize by aborting the evaluation(training) of models which are clearly not performing in the early phase and not completely evaluating them. In the context of blackbox optimization, the sampler and pruner are not clearly distinguished, and both are often addressed in the framework of multi-fidelity blackbox optimization (Figure 2).
Kurobako is a benchmark tool that supports multi-fidelity blackbox optimization.
Through development of Optuna, there are many occasions where the algorithm for sampler or pruner is tested or improved. The problem arises — “How to evaluate the optimization performance of new algorithms?”
Blackbox optimization is a research area that has a long history, and many benchmark functions that can be used to evaluate optimization algorithms have been developed. One such open source benchmark framework is COCO (COmparing Continuous Optimisers). However, most of these functions can only deal with search space that consists of continuous variables and single fidelity, which is only a subset of the problems that Optuna addresses (Figure 3).
A benchmark tool that deals with problems that are more realistic is nas_benchmarks. This provides benchmarks that are based on two tasks, Neural Architecture Search (NASBench) and Hyperparameter Search (HPOBench). The search space consists of discrete variables and categorical variables (Figure 4). Also, it can deal with multi-fidelity evaluation.
The above benchmarks are useful but do not match our needs.
In particular, to evaluate the main optimization functions in Optuna, below must be supported:
- Search space including continuous, discrete, and categorical variables
- Multi-fidelity evaluation
- Asynchronous parallel execution of multiple trials
- Conditional search space including if and for loops
To the best of my knowledge, no benchmark tool can deal with all of the above. Motivation to develop Kurobako is to be able to cover all of this so that all optimization functions of Optuna can be evaluated easily.
Also, evaluating a certain benchmark can result in the optimization algorithm (and the hyperparameter) overfitting the benchmark. To prevent this, we wanted to be able to add a new benchmark problem easily and deal with various benchmark classes in the same framework.
Kurobako is a command-line tool written in Rust, and uses JSON for input/output.
The following commands show the basic usage:
You can find the content of the last report in the gist.
It is possible to generate figures from the benchmark result (result.json) by executing the command
$ cat result.json | kurobako plot curve (Figure 5).
Y axis in graph is the best results (average of 30 results). X axis shows resource consumption of optimization, 1 budget corresponds to one complete evaluation of objective function. In HPOBench, a complete evaluation (training of model) takes 100 epochs, so if the budget is 40, training for 4000 epochs have been run. In HPOBench, all hyperparameter combinations in the search space are evaluated beforehand, and the results are cached. So on benchmark runs, model training is unnecessary, and only cache lookup is necessary.
The benchmark above is only an example, but when the available budget is limited, enabling pruning will give better results. However, as the available budget increases, the effect of pruning diminishes. Especially in the last stage of Parkinsons Telemonitoring Data Set, TPE with pruning by Successive Halving algorithm is worse than without pruning.
How to add optimization algorithm
Kurobako is written in Rust, but It is possible to implement an original optimization algorithm (solver) as a command line program. (As is for problem)
For Python, kurobako package is provided in PyPI, so solvers can be implemented as below:
If the optimization algorithm is compliant to Optuna’s sampler interface, it is easier:
You can specify the above solvers in your benchmark as follows:
In this post, I introduced a benchmark tool for blackbox optimization. Please test this if you are interested in contributing to Optuna’s algorithm but are not sure about the optimization performance. Final goal is to integrate the benchmark to Optuna’s CI.
Though not discussed in this post, Kurobako supports features not supported in current Optuna, such as multi-objective optimization, optimization with constraint, suspend/resume of trials. If you use it in a special way, you can use it for meta optimization use. Examples are “hyperparameter optimization of the solver” (setting solver as the problem) or, even “searching for default parameters that perform well over multiple problems”.
This is an English translation version of my Japanese blog post. The translation was mainly done by Hiroshi Ono. Many thanks for his voluntary contribution.
 Akiba, Takuya, et al. “Optuna: A next-generation hyperparameter optimization framework.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019.
 Jamil, Momin, and Xin-She Yang. “A literature survey of benchmark functions for global optimization problems.” arXiv preprint arXiv:1308.4008 (2013).
 Hansen, Nikolaus, et al. “COCO: A platform for comparing continuous optimizers in a black-box setting.” arXiv preprint arXiv:1603.08785 (2016).
 Ying, Chris, et al. “Nas-bench-101: Towards reproducible neural architecture search.” arXiv preprint arXiv:1902.09635 (2019).
 Klein, Aaron, and Frank Hutter. “Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization.” arXiv preprint arXiv:1905.04970 (2019).
 Dewancker, Ian, et al. “A strategy for ranking optimization methods using multiple criteria.” Workshop on Automatic Machine Learning. 2016.
 Bergstra, James S., et al. “Algorithms for hyper-parameter optimization.” Advances in neural information processing systems. 2011.