Introducing Optuna’s Native GPSampler

shuheiwatanabe
Optuna
Published in
5 min readApr 22, 2024

Introduction

In this blog article, we introduce GPSampler, a novel sampler in Optuna v3.6 that employs Gaussian process-based Bayesian optimization.

Optuna’s previous versions used BoTorchSampler for Gaussian process-based Bayesian optimization. BoTorchSampler utilized a framework for Bayesian optimization research called BoTorch to ease the implementation burden, but it faced some issues due to conflicts with Optuna’s design.

First, BoTorchSampler’s execution speed was significantly slower than other samplers. This not only impacted user experience in terms of speed, but also increased the time for Optuna’s unit testing and CI processes, resulting in reduced development efficiency.

Additionally, Optuna’s dependency on BoTorch for all Gaussian process-dependent algorithms posed challenges when BoTorch could not be installed, preventing the use of BoTorchSampler and the regret bound-based terminator, a function for automatically ending the optimization.

GPSampler aims to address these issues by implementing the Gaussian processes directly within Optuna, along with enhanced discrete optimization. BoTorchSampler has been moved to optuna-integration, so you can continue using its features through the same interface as before by installing “optuna-integration” into v3.6 and later.

Exploring the Background and Advantages of Introducing the New Feature

The primary goal of adding the new feature was to boost execution speed and development efficiency by replacing BoTorchSampler with a Gaussian process-based sampler created in-house. It also aims to lower the difficulty of incorporating new features that utilize Gaussian processes. In this section, we outline the previous challenges and discuss how the new functionality addresses them.

Why Was BoTorchSampler Slower?

BoTorch is a library designed for researchers in Bayesian optimization, offering a flexible interface to experiment with various Bayesian optimization methods. This flexibility stems from BoTorch’s abstract design, intended to support a variety of methods. However, this flexibility was an over-engineering for Optuna, where only the simplest features of Gaussian processes are used.

BoTorch’s design philosophy is to extensively make use of batching and leverage GPUs for fast optimization. On the other hand, Optuna’s design does not account for GPU-based sampling, thus we are not taking full advantage of BoTorch’s GPU-accelerated benefits. Additionally, because trials are not generated in batches, BoTorch’s advantages in batch optimization do not translate well to Optuna, diminishing its speed benefits.

To address this, we implemented our own Gaussian processes optimized for Optuna’s design, keeping the functionality to a bare minimum. This accelerated the algorithms in Optuna that use Gaussian processes.

How Has the Dependency Structure Changed?

BoTorchSampler depends on multiple large libraries such as BoTorch, GPyTorch, Pyro, and PyTorch, which can be problematic if any of these libraries cannot be installed due to compatibility issues with the version of Python, OS, and so on. In contrast, the newly implemented GPSampler only requires PyTorch and SciPy.

Has Performance Been Affected by the New Implementation?

We have confirmed that GPSampler performs as well as, if not better than, BoTorchSampler in most of the experiments we have conducted. In Particular, BoTorchSampler struggled with mixed spaces which have continuous and discrete parameters and underperformed compared to RandomSampler. In developing this new feature, we improved mixed space performance by spending some work on optimization of acquisition functions in such spaces.

The optimization of acquisition functions involves a repetitive process of performing the gradient method for continuous variables and local optimization for each discrete variable, using an initial solution from the quasi-Monte Carlo method. The previous implementation did not take into account the fact that only discrete observations could be obtained for discrete variables, which led to issues in optimizing discrete spaces. However, the latest modifications have shown clear improvements in discrete space performance. Some of the experimental results are described later in this article.

How to Use GPSampler

In Optuna v3.6, GPSampler does not support the following features:

  • Multi-objective optimization,
  • Constrained optimization, and
  • Batch optimization.

If you need these features, continue using BoTorchSampler.

Since GPSampler depends on PyTorch and SciPy, you need to manually install these libraries after installing Optuna.

$ pip install optuna>=3.6.0

# You need additional dependencies such as torch and scipy.
$ pip install scipy torch

Once installed, it is easy to use GPSampler in Optuna. Simply define a sampler and pass it when using optuna.create_study.

import optuna


def objective(trial):
x = trial.suggest_float("x", -5, 5)
return x**2


if __name__ == “__main__”:
sampler = optuna.samplers.GPSampler()
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=100)

If your objective function is deterministic, i.e., if it returns the same values given the same parameters as the above example, specifying deterministic_objective=True may improve performance.

sampler = optuna.samplers.GPSampler(deterministic_objective=True)

Please note that starting with v3.6, BoTorchSampler is now within the optuna-integration package. You can still import it at optuna.integration.BoTorchSampler, but you will need to install the optuna-integration package to do so.

# You need to install optuna-integration and botorch to use BoTorchSampler from v3.6.
$ pip install optuna-integration botorch

Speed and Performance of GPSampler

Finally, we conducted tests using simple benchmark functions to evaluate the execution speed improvements and performance of GPSampler against BoTorchSampler.

In our experiment, we used two functions: (1) A 5-dimensional Rotated Ellipsoid function and (2) HPOLib’s Naval Propulsion [1] (A 9-dimensional mixed discrete and categorical space). The 5-dimensional Rotated Ellipsoid function looks like this:

import numpy as np
import optuna


cos30 = np.cos(np.pi / 6)
sin30 = np.sin(np.pi / 6)
DIM = 5
ROTATION = np.identity(DIM)
for i in range(DIM - 1):
rotate = np.identity(DIM)
rotate[i : i + 2, i : i + 2] = np.array([[cos30, -sin30], [cos30, sin30]])
ROTATION = ROTATION @ rotate


def rotated_ellipsoid(trial):
X = np.array([trial.suggest_float(f"x{i}", -5, 5) for i in range(DIM)])
RX = ROTATION @ X
weights = np.array([5**i for i in range(DIM)])
return weights @ ((RX - 2) ** 2)

For our experiment, we utilized a Core i7–10700 8-core Ubuntu 18.04 machine and BoTorch v0.10.0.

Figure 1 shows a comparison of execution times. The results demonstrate that GPSampler is more than twice as fast as BoTorchSampler in both continuous and discrete spaces. Figure 2 illustrates the optimization performance. As mentioned earlier, BoTorch is a library designed for solving continuous spaces, which explains why RandomSampler outperforms BoTorchSampler in the discrete space as in Figure 2 (right). On the other hand, GPSampler has demonstrated improved performance in the discrete space. Also, Figure 2 (left) shows that GPSampler’s performance does not deteriorate in continuous spaces even with these changes.

Figure 1. Comparison of execution times for each experiment. The horizontal axis represents the number of trials evaluated, while the vertical axis represents the time needed to evaluate the corresponding number of trials. Each result represents the average of 10 independent studies, with the translucent band indicating the standard error of these 10 studies. Left: Execution time comparison in the experiment with the 5-dimensional Rotated Ellipsoid function. This indicates more than a 2-fold improvement in the execution speed for continuous space. Right: Execution time comparison in the HPOLib experiment, showing a more than 5-fold reduction in execution time in a mixed discrete-categorical space.
Figure 2. Performance comparison for each experiment. The horizontal axis represents the number of trials evaluated, while the vertical axis represents the best objective function value for the corresponding number of trials evaluated. Each result represents the average of 10 independent studies, with the translucent band indicating the standard error of these 10 studies. Left: Performance comparison in the experiment with the 5D Rotated Ellipsoid function. The results demonstrate that optimization performance in continuous space remains on par with BoTorch. Right: Performance comparison in the HPOLib experiment. The results indicate that GPSampler significantly outperforms RandomSampler in the mixed discrete and categorical space due to its support for discrete optimization, whereas BoTorchSampler underperforms RandomSampler.

Conclusion

In this blog, we provided a historical perspective on the introduction of GPSampler and highlighted the benefits it offers to users. By simplifying the internal Gaussian process implementation, we managed to increase speed and reduce dependencies compared to BoTorchSampler. Currently, only the most basic features are supported, while advanced features such as multi-objective optimization and constrained optimization are not yet available. If you have specific requests for these features, please let us know through Issues or Discussions.

In Optuna v3.6, we have added various other features as well. Please take a look at our release blog. Also, be sure to check out our blogs about our ambitious recent endeavors using Rust and our pruning function using statistical tests.

[1] Klein, A. and Hutter, F. (2019). Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv:1905.04970.

--

--