Introducing Optuna’s Native GPSampler
Introduction
In this blog article, we introduce GPSampler, a novel sampler in Optuna v3.6 that employs Gaussian process-based Bayesian optimization.
Optuna’s previous versions used BoTorchSampler
for Gaussian process-based Bayesian optimization. BoTorchSampler
utilized a framework for Bayesian optimization research called BoTorch to ease the implementation burden, but it faced some issues due to conflicts with Optuna’s design.
First, BoTorchSampler
’s execution speed was significantly slower than other samplers. This not only impacted user experience in terms of speed, but also increased the time for Optuna’s unit testing and CI processes, resulting in reduced development efficiency.
Additionally, Optuna’s dependency on BoTorch for all Gaussian process-dependent algorithms posed challenges when BoTorch could not be installed, preventing the use of BoTorchSampler
and the regret bound-based terminator, a function for automatically ending the optimization.
GPSampler
aims to address these issues by implementing the Gaussian processes directly within Optuna, along with enhanced discrete optimization. BoTorchSampler
has been moved to optuna-integration, so you can continue using its features through the same interface as before by installing “optuna-integration” into v3.6 and later.
Exploring the Background and Advantages of Introducing the New Feature
The primary goal of adding the new feature was to boost execution speed and development efficiency by replacing BoTorchSampler
with a Gaussian process-based sampler created in-house. It also aims to lower the difficulty of incorporating new features that utilize Gaussian processes. In this section, we outline the previous challenges and discuss how the new functionality addresses them.
Why Was BoTorchSampler Slower?
BoTorch is a library designed for researchers in Bayesian optimization, offering a flexible interface to experiment with various Bayesian optimization methods. This flexibility stems from BoTorch’s abstract design, intended to support a variety of methods. However, this flexibility was an over-engineering for Optuna, where only the simplest features of Gaussian processes are used.
BoTorch’s design philosophy is to extensively make use of batching and leverage GPUs for fast optimization. On the other hand, Optuna’s design does not account for GPU-based sampling, thus we are not taking full advantage of BoTorch’s GPU-accelerated benefits. Additionally, because trials are not generated in batches, BoTorch’s advantages in batch optimization do not translate well to Optuna, diminishing its speed benefits.
To address this, we implemented our own Gaussian processes optimized for Optuna’s design, keeping the functionality to a bare minimum. This accelerated the algorithms in Optuna that use Gaussian processes.
How Has the Dependency Structure Changed?
BoTorchSampler
depends on multiple large libraries such as BoTorch, GPyTorch, Pyro, and PyTorch, which can be problematic if any of these libraries cannot be installed due to compatibility issues with the version of Python, OS, and so on. In contrast, the newly implemented GPSampler
only requires PyTorch and SciPy.
Has Performance Been Affected by the New Implementation?
We have confirmed that GPSampler
performs as well as, if not better than, BoTorchSampler
in most of the experiments we have conducted. In Particular, BoTorchSampler
struggled with mixed spaces which have continuous and discrete parameters and underperformed compared to RandomSampler
. In developing this new feature, we improved mixed space performance by spending some work on optimization of acquisition functions in such spaces.
The optimization of acquisition functions involves a repetitive process of performing the gradient method for continuous variables and local optimization for each discrete variable, using an initial solution from the quasi-Monte Carlo method. The previous implementation did not take into account the fact that only discrete observations could be obtained for discrete variables, which led to issues in optimizing discrete spaces. However, the latest modifications have shown clear improvements in discrete space performance. Some of the experimental results are described later in this article.
How to Use GPSampler
In Optuna v3.6, GPSampler
does not support the following features:
- Multi-objective optimization,
- Constrained optimization, and
- Batch optimization.
If you need these features, continue using BoTorchSampler
.
Since GPSampler
depends on PyTorch and SciPy, you need to manually install these libraries after installing Optuna.
$ pip install optuna>=3.6.0
# You need additional dependencies such as torch and scipy.
$ pip install scipy torch
Once installed, it is easy to use GPSampler
in Optuna. Simply define a sampler and pass it when using optuna.create_study
.
import optuna
def objective(trial):
x = trial.suggest_float("x", -5, 5)
return x**2
if __name__ == “__main__”:
sampler = optuna.samplers.GPSampler()
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=100)
If your objective function is deterministic, i.e., if it returns the same values given the same parameters as the above example, specifying deterministic_objective=True
may improve performance.
sampler = optuna.samplers.GPSampler(deterministic_objective=True)
Please note that starting with v3.6, BoTorchSampler is now within the optuna-integration
package. You can still import it at optuna.integration.BoTorchSampler
, but you will need to install the optuna-integration
package to do so.
# You need to install optuna-integration and botorch to use BoTorchSampler from v3.6.
$ pip install optuna-integration botorch
Speed and Performance of GPSampler
Finally, we conducted tests using simple benchmark functions to evaluate the execution speed improvements and performance of GPSampler
against BoTorchSampler
.
In our experiment, we used two functions: (1) A 5-dimensional Rotated Ellipsoid function and (2) HPOLib’s Naval Propulsion [1] (A 9-dimensional mixed discrete and categorical space). The 5-dimensional Rotated Ellipsoid function looks like this:
import numpy as np
import optuna
cos30 = np.cos(np.pi / 6)
sin30 = np.sin(np.pi / 6)
DIM = 5
ROTATION = np.identity(DIM)
for i in range(DIM - 1):
rotate = np.identity(DIM)
rotate[i : i + 2, i : i + 2] = np.array([[cos30, -sin30], [cos30, sin30]])
ROTATION = ROTATION @ rotate
def rotated_ellipsoid(trial):
X = np.array([trial.suggest_float(f"x{i}", -5, 5) for i in range(DIM)])
RX = ROTATION @ X
weights = np.array([5**i for i in range(DIM)])
return weights @ ((RX - 2) ** 2)
For our experiment, we utilized a Core i7–10700 8-core Ubuntu 18.04 machine and BoTorch v0.10.0.
Figure 1 shows a comparison of execution times. The results demonstrate that GPSampler
is more than twice as fast as BoTorchSampler
in both continuous and discrete spaces. Figure 2 illustrates the optimization performance. As mentioned earlier, BoTorch is a library designed for solving continuous spaces, which explains why RandomSampler
outperforms BoTorchSampler
in the discrete space as in Figure 2 (right). On the other hand, GPSampler has demonstrated improved performance in the discrete space. Also, Figure 2 (left) shows that GPSampler
’s performance does not deteriorate in continuous spaces even with these changes.
Conclusion
In this blog, we provided a historical perspective on the introduction of GPSampler
and highlighted the benefits it offers to users. By simplifying the internal Gaussian process implementation, we managed to increase speed and reduce dependencies compared to BoTorchSampler
. Currently, only the most basic features are supported, while advanced features such as multi-objective optimization and constrained optimization are not yet available. If you have specific requests for these features, please let us know through Issues or Discussions.
In Optuna v3.6, we have added various other features as well. Please take a look at our release blog. Also, be sure to check out our blogs about our ambitious recent endeavors using Rust and our pruning function using statistical tests.
[1] Klein, A. and Hutter, F. (2019). Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv:1905.04970.