Announcing Optuna 3.0 (Part 1)
We released the first major version of Optuna in January 2020 and the second major version in July 2020, and since then, both user and development communities have grown significantly. As for the user community, Optuna has been downloaded 1.43 million times per month and has over 6,800 stars on GitHub. The development community has grown to 177 contributors and 1225 PRs since v2.0.
We have developed v3.0 with the following goals in mind:
- Open up the development direction to the community
- Discuss development issues via public roadmap and GitHub issues
- Improve stability
- Significantly reduce ambiguity in specification
- Determine whether experimental features should be approved for normal use
- Deliver useful features and algorithms to all users
- Improve the performance with the default arguments
- Increase exposure of existing features/algorithms available to light users. (Target features/algorithms that users may be unaware of, e.g. multivariate TPE, constant-liar, etc.)
In this blog, we will share how Optuna has evolved throughout the process.
- We made public the development roadmap and developed v3.0 in collaboration with many contributors
- API simplification, stabilization, and refactoring of features that users often encounter were done in order to improve the stability of the framework
- For the first time, extensive benchmarking of algorithms was conducted to verify the usefulness of features introduced prior to v3.0
- Many new features not initially part of the roadmap were introduced with the help of various contributors
Collaboration with Contributors
We realized that for v3.0 to be successful, it would take more than just committers (maintainers of Optuna); it would require the combined efforts of many contributors. To make it easier for non-committers to understand how v3.0 development would take place, we have maintained a GitHub issue and compiled a roadmap that is available on the GitHub wiki. We also organized development sprints to advance the development of v3.0 and to facilitate communication between committers and contributors. The development sprints were held 6 times in total, with many contributors participating and gaining a lot of PRs. We started in Japan to find out if this was a reasonable thing to do and now plan to continue with development sprints. A worldwide version is also under consideration. Stay tuned.
Optuna has a number of core APIs. One being the suggest API and the
optuna.Study class. The visualization module is also frequently used to analyze results. Many of these have been simplified, stabilized, and refactored in v3.0. Here is a brief description of how these features have changed.
Simplified Suggest API
The suggest API is used to sample parameters from the search space. In v3.0, the suggest API has been aggregated into 3 methods:
suggest_float for floating point parameters,
suggest_int for integer parameters, and
suggest_catagorical for categorical parameters. Variations can now be specified by the arguments of those methods. By using this simpler suggest API, you will be able to write more readable and maintainable code.
For example, let’s see how suggesting a float parameter changed from the v2 series to v3.0. If you want to sample a floating point parameter
high, do the following.
If you want to focus your search near
low in the range, you can do the following.
If you want to sample from a discrete set, such as
low + k, low + 2 * k, low + 3 * k, …, you can do so as follows.
In v3.0, the suggest method for sampling floating point parameters has been simplified to
suggest_float, as you can see.
The simplification of the suggest API was made possible by the help of the following contributors. Thank you! @himkt, @nyanhi, @nzw0301, and @xadrianzetx.
Introduction of a Test Policy
As Optuna’s code base grew, so did its test code. However, the way tests were written and the level of test cases varied from developer to developer, leading to a lack of consistency. We addressed this situation and have developed and published a test policy in v3.0 that defines how tests for Optuna should be written. Based on the published test policy, we have improved many unit tests.
The development of the test policy and the modification of unit tests based on it were made possible by many contributors. @HideakiImamura, @c-bata, @g-votte, @not522, @toshihikoyanase, @contramundum53, @nzw0301, @keisuke-umezawa, and @knshnb.
Optuna can be used not only for optimization but also for analysis of optimization results. Currently, the visualization module in Optuna provides two modules, one that visualizes using Plotly as the backend (visualization functions in
optuna.visualization) and one that visualizes using Matplotlib (visualization functions in
optuna.visualization.matplotlib). Historically, the former was implemented first, and the latter was introduced experimentally after some time. Until now, the two have been developed separately, resulting in problems such as features in one not being available in the other, inconsistent styles, and undesirable differences in internal implementations.
The task was not limited to resolving functional differences, but encompassed a variety of technical issues such as unifying internal implementations and improved testing strategies, which were resolved one by one by a number of contributors. Having people with a diverse background work together on this refactoring, we think was an important part of this release. Below are the contributors involved. Thank you!
@HideakiImamura, @IEP, @MasahitoKumada, @TakuyaInoue-github, @akawashiro, @belldandyxtq, @c-bata, @contramundum53, @divyanshugit, @divyanshugit, @dubey-anshuman, @fukatani, @harupy, @himkt, @kasparthommen, @keisukefukuda, @knshnb, @makinzm, @nzw0301, @semiexp, @shu65, @sidshrivastav, @takoika, and @xadrianzetx.
Up to the v2 series, we have introduced many features to Optuna. Several of these features were experimental due to unstable behavior, potential bugs, lack of use case analysis, etc. Through the development of v3.0, we have decided to provide many of these experimental features as stable features by going through their behavior, fixing bugs, and analyzing use cases. The following is a list of features that have been stabilized in v3.0.
Stabilization of these features was made possible by many contributors. Thank you! @contramundum53, @knshnb, @HideakiImamura, and @himkt.
Optuna implements a number of optimization algorithms for solving black-box optimization problems. The current default is a Bayesian optimization algorithm called Tree-structured Parzen Estimator (TPE). First, in v3.0, we published the following table summarizing the empirically known behavior of the algorithms and the characteristics of their implementation in Optuna. The table aims to help choose the right optimization algorithm for a particular task.
However, this is not enough to quantitatively evaluate the performance of an algorithm. We have developed a benchmark environment to evaluate the performance of Optuna’s algorithms in v3.0 and conducted performance evaluation experiments with a view to changing the default algorithm or default arguments of algorithms. Below are brief descriptions of the benchmark environment and the performance evaluation experiments.
In developing the benchmark environment, we considered it important that anyone should be able to easily reproduce the experiment. The benchmarking environment consists of scripts to solve specific problems by specifying samplers and pruners, and a mechanism to run them on GitHub Actions. There are over 170 problems currently available in Optuna, including machine learning hyperparameter optimizations and well-known test functions. Users and contributors can run the scripts in their local environment, or they can easily run the workflow on their own fork of Optuna with GitHub Actions. To learn more about the benchmark environment, please check here.
Many contributors were involved in the implementation of the benchmark scripts and the workflow on GitHub Actions. Thanks to @HideakiImamura, @drumehiron, @xadrianzetx, @kei-mo, @contramundum53, and @shu65.
Using the developed benchmarking environment, we conducted a performance evaluation experiment. This is to fairly compare the algorithms implemented in Optuna in various problem settings in order to consider changes to default arguments including sampler and pruners.
Since it would be impractical to run the experiment on all sampler/pruner combinations, we limited our benchmarking to the most important ones. We ran 100 studies of 1000 trials on over 30 different sampler/pruner combinations and over 170 different problem combinations, for a total of over 500 million trials. Since this experiment required a large amount of computational resources, we conducted the experiment using the Preferred Networks, Inc. owned compute cluster. We ran roughly 7000 CPUs in parallel over a period of about 3 days. Below are brief summaries of the benchmark results.
- Sampler benchmarks
TPESampleris the current default sampling algorithm in Optuna. The current default arguments of
TPESamplerperforms well in many settings. However, we also found that different samplers are good at different types of problems and the number of trials.
TPESampleris an option to enable the multivariate sampling. The default value of the
multivariateoption is currently
False. We found that setting the
Trueimproved performance in many cases, but sometimes resulted in worse performance on high-dimensional problems. Since changing the default argument has a significant impact on users, we decided to be conservative and not change the default value of the
Trueat this stage.
- We also conducted a speed benchmark using the existing environment based on
asv. We observed that changing the
multivariateargument would speed up sampling. Note that the speed difference is a result of implementation and may change in the future.
- Sampler benchmarks in distributed optimization
- We conducted a benchmark with distributed optimization. The
TPESampleris an option to enable the constant liar strategy. We found that setting the
Truedoes not necessarily improve performance. For some types of problems,
BoTorchSampler, which is a Gaussian processes based algorithm was found to perform better. The default value of the
False, and we decided not to change it to
- Pruner benchmarks
MedianPruneris the current default pruning algorithm in Optuna.
HyperbandPruneris the pruner with high expectations from committers. The results showed that in some cases the
MedianPrunerperformed better, while in other cases the
HyperbandPrunerperformed better using their default arguments. Changing the default pruner was therefore postponed. It’s worth pointing out that different pruner configurations are likely to yield different results.
The following figure shows how the best value varies with the number of trials for each sampler with default arguments in a problem called HPO bench (naval). More than 170 other such figures were obtained in this experiment.
You can reproduce partial benchmark experiments on GitHub Actions. If you would like to examine the behavior of a particular algorithm in more detail, we encourage you to try it for yourself. If you are interested in detailed results of our benchmark experiments, please check #2964 and #2906.
Thank you for reading this far. In the next half of this blog, we will discuss various new features added to Optuna v3.0, as well as development items that were on the roadmap but were never actually tackled due to twists and turns. Enjoy!