Significant Speed Up of Multi-Objective TPESampler in Optuna v4.0.0

Published in

Optuna

7 min readAug 7, 2024

TL;DR

We continually improve the speed of Optuna, and TPESampler for multi-objective optimization has been speeded up in Optuna v4.0.0.
As a result of the update, TPESampler for multi-objective optimization was consistently speeded up and we confirmed that TPESampler for tri-objective optimization with 200 Trials is now 300 times faster than that in Optuna v3.6.1 from our experiments.

Note that since Optuna v4.0 is currently a beta version, users need to specify the beta version at the installation to enjoy the speedup in this article:

pip install optuna==4.0.0b0

Speed Enhancement in Optuna

User experience in Optuna is one of the most important aspects we care about. For this reason, Optuna is not only adding new features but also enhancing usability further, for example, by reducing the dependencies and increasing the speed. For instance, in the past, we introduced lazy module imports and more efficient storage processing to speed up Optuna.

These speedups are not necessarily critical for all users, but as Optuna becomes more widely used, these speedups may become important in various ways. For example, if the objective function, e.g., the training of a neural network, takes a lot of time, the sampling overhead will not affect the optimization routine. On the other hand, if the objective function takes from a few seconds to a few minutes and we would like to evaluate thousands of trials, the speed improvement may directly affect the usability and possible numbers of iterations in optimization.

In this article, we will discuss the speedup of TPESampler for multi-objective optimization, which is now capable of efficiently sampling thousands of trials, in Optuna v4.0.0.

Why Speedup of TPESampler for Multi-Objective Optimization?

Users sometimes want to optimize multiple objective functions jointly by Optuna. For example, hyperparameter optimization of a machine learning model for a translation task involves both translation quality quantified by a metric such as BLEU and response speed. For such hyperparameter optimization, multi-objective optimization is employed. Since multi-objective optimization considers the trade-offs between objectives, multiple optimal solutions are commonly found. Hence, multi-objective optimization often requires much more trials to cover diverse trade-offs in comparison to single-objective optimization.

To reduce the number of trials for multi-objective optimization as much as possible, there is a need to develop a better-performing sampler. MOTPE [1] has recently been reported to outperform NSGA-II, which is the default multi-objective sampler in Optuna, for optimizations with 1000–10000 Trials, c.f., Figs. 8–10 in Ozaki et al. [2]. To exploit the sample efficiency, Optuna introduced MOTPE in v2.3.0. Note that the usage of MOTPE is described later. Although MOTPE is known to be performant, the current Optuna implementation requires a non-negligible sampling overhead in comparison to objective function after hundreds of trials.

TPESampler, in which MOTPE was integrated, has various advantages besides performance over NSGAIISampler, the default multi-objective sampler, such as the handling of dynamic search space and categorical distance. With that being said, if users would like to evaluate more than hundreds of trials for multi-objective optimization, it is basically impossible to use TPESampler due to its sampling overhead. To address this problem, we tackled the speedup of TPESampler for multi-objective optimization so that users can enjoy TPESampler for multi-objective optimization even with thousands of trials. Due to the space limit, we omit the technical details of TPE and MOTPE, but if readers are interested in the details, we kindly ask the readers to refer to the MOTPE papers [1,2] and the TPE tutorial [3].

The Modifications Made for the Speedup

The sampling overhead of MOTPE came from WFG, which is a hypervolume calculation needed for the coverage quantification of optimal trade-offs, and non-dominated sort and HSSP (Hypervolume Subset Selection Problem) [5], which are used for the ranking calculation of each observation. We do not discuss the technical details here, but in a nutshell, the speedup has been achieved by (1) improvement in the constant factors and the order of the time complexity of these algorithms and (2) the refactoring using NumPy. For more information, please refer to the corresponding PRs:

#5302: the constant factor improvement in non-dominated sort, c.f., fast-pareto introduced by Watanabe [6],
#5303: the time complexity order improvement of the 2D hypervolume calculation
#5346: the time complexity order improvement of 2D HSSP
#5355: the constant factor improvement of HSSP by lazy evaluation of hypervolume [7]
#5424: the refactoring of WFG by NumPy
#5433, #5591: the constant factor improvement of WFG

How Can We Use TPESampler for Multi-Objective Optimization?

The simplest way to use MOTPE is the following:

import optuna


def objective(trial):
    x = trial.suggest_float("x", -5, 5)
    y = trial.suggest_float("y", -5, 5)
    objective_1 = x**2 + y**2
    objective_2 = (x - 2)**2 + (y - 2)**2
    return objective_1, objective_2


sampler = optuna.samplers.TPESampler()
study = optuna.create_study(sampler=sampler, directions=["minimize"]*2)
study.optimize(objective, n_trials=100)

Notice that since the default multi-objective sampler in Optuna is NSGAIISampler, NSGAIISampler will be used in the optimization if users do not pass an instantiated TPESampler to study.

Benchmarking Results

To see the speedup effect of Optuna v4.0.0 (v4.0.0b0 was used in the experiments since v4.0.0 was not released yet), we would like to compare the speed with TPESampler in Optuna v3.6.1, which was the last release. The code used in the experiments can be found on Gist. Furthermore, the visualization function for this section is available in OptunaHub and can be loaded by optunahub.load_module(package="visualization/plot_sampling_speed"). For more details, please check the individual package page.

In all the setups, we used a two-dimensional search space and ran each experiment over three different random seeds. Each experiment was terminated when either 10,000 Trials or 1 hour was fulfilled. We used Ubuntu 20.04, 12th Gen Intel(R) Core(TM) i7–1255U CPU, Python 3.9.13, and NumPy v2.0.0 as the execution environment.

Figure 1 shows the results using 1 to 3 objectives. As TPESampler for multi-objective optimization cannot be, in principle, quicker than that for single-objective optimization, meaning that the improvement we would like to yield is to make the lines obtained in multi-objective setups as close as possible to the line obtained by the single-objective setup, which is the improvement limit. Additionally, we plotted the runtime prediction at each trial number by dashed lines using the linear regression of (a,b) where a and b are the coefficients of log(mean of n_trial) = a * log(elapsed_time) + b.

As shown in Figure 1 (dark blue lines), there is no big difference between Optuna v3.6.1 and v4.0.0 for single-objective optimization. On the other hand, the runtime of the bi-objective setup by Optuna v4.0.0 (yellow-green line marked by ★) is comparable with the single-objective setup, which is the improvement limit. Additionally, the runtime for the tri-objective setup (yellow line marked by ★) also achieved enormous speedup although it is still not comparable with the single-objective setup.

For example, the runtime required for 200 Trials in the tri-objective setup is now about 3 seconds while it took about 1,000 seconds for Optuna v3.6.1. This implies that Optuna v4.0.0 is 300 times faster under this condition. Besides that, Optuna v3.6.1 would require more than a year to sample 1,000 Trials for the tri-objective setup based on the prediction by the linear regression while Optuna v4.0.0 could do so in 100 seconds. As demonstrated, we can now use TPESampler for multi-objective setups even for more than 1,000 Trials. Furthermore, Optuna v4.0.0 works at approximately the same speed even with 3–5 objectives as can be seen in Figure 2.

Figure 1. The elapsed time of TPESampler at each number of sampled trials for the number of objectives from 1 to 3. The horizontal axis shows the number of suggested trials by TPESampler, and the vertical axis shows the mean of the runtime over 3 unique runs. The ▲ and ★ markers are the results obtained by v3.6.1 and v4.0.0, respectively. The dashed lines are the predictions obtained by linear regression. Note that, in general, if the lines are located lower, it means the implementations work faster.

Figure 2. The elapsed time of TPESampler at each number of sampled trials for the number of objectives from 3 to 5. The figure reading is the same as in Figure 1. This figure demonstrates that the sampling by Optuna v4.0.0 marked by ★ works is faster than that by Optuna v3.6.1.

Conclusion

We are continually working on the speedup of Optuna, and we focused on the speedup of TPESampler for multi-objective optimization in this article. From the benchmarking results, we demonstrated that TPESampler for tri-objective optimization is now 300 times faster than that in v3.6.1. As explained above, TPESampler has various advantages over NSGAIISampler, so we would be very happy if many users try our new TPESampler for multi-objective optimization for this opportunity. Last but not least, we would like to keep on making an effort to further speed up Optuna.

We will hold the Optuna development sprint for the first time in about two years on August 17 in Tokyo. At the sprint, participants develop (a) feature(s) of OptunaHub. Please join us!

Optuna Dev. Sprint #7 — connpass

References

[1] Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, and Masaki Onishi (2020). Multiobjective Tree-Structured Parzen Estimator for Computationally Expensive Optimization Problems. Proceedings of Genetic and Evolutionary Computation Conference (GECCO) 2020.

[2] Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, and Masaki Onishi (2022). Multiobjective Tree-Structured Parzen Estimator. Journal of Artificial Intelligence Research (JAIR) 2022.

[3] Shuhei Watanabe (2023). Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance. arXiv Preprint arXiv:2304.11127.

[4] While Lyndon, Lucas Bradstreet, and Luigi Barone (2011). A Fast Way of Calculating Exact Hypervolumes. Evolutionary Computation 16.1: 86–95.

[5] Andreia P. Guerreiro, Carlos M. Fonseca, and Luís Paquete (2016). Greedy Hypervolume Subset Selection in Low Dimensions. Evolutionary Computation 24.3: 521–544.

[6] Shuhei Watanabe (2023). Python Tool for Visualizing Variability of Pareto Fronts over Multiple Runs. arXiv Preprint arXiv:2305.08852.

[7] Chen Weiyu, Hisao Ishibuchi, and Ke Shang (2020). Lazy Greedy Hypervolume Subset Selection from Large Candidate Solution Sets. Congress on Evolutionary Computation (CEC) 2020.