OptunaHub Benchmarks: A New Feature to Use/Register Various Benchmark Problems
Introduction
Performance benchmarking is essential in both research and algorithm development because it enables direct comparisons of under-developing algorithms with some baseline algorithms to quantify the performance for idea verification purposes or to claim the superiority of the proposed method. Due to the diverse problem setups in the real world, there is also a wide variety of benchmark problems.
OptunaHub Benchmarks is a new feature added in the latest version v0.2.0 of optunahub, designed to facilitate convenient benchmarking. You can enjoy the following benefits from this feature:
- Usage of various benchmark problems through a unified API simply by switching package names.
- A quick analysis of benchmark results using the optuna.visualization module or optuna-dashboard.
- Usage of the benchmark problems even from optimization frameworks other than Optuna through a general API.
- Easy registration of new benchmark problems to OptunaHub.
Below is an example code using OptunaHub Benchmarks. In this way, users can easily load benchmark functions, execute optimizations, and visualize results (for more practical examples, please refer to Appendix: Practical Benchmark Code Example).
import optuna
import optunahub
bbob = optunahub.load_module("benchmarks/bbob")
sphere2d = bbob.Problem(function_id=1, dimension=2, instance_id=1)
study = optuna.create_study(sampler=optuna.samplers.TPESampler(seed=42), directions=sphere2d.directions)
study.optimize(sphere2d, n_trials=20)
optuna.visualization.plot_optimization_history(study).show()
This article provides a detailed introduction to how to use OptunaHub Benchmarks.
OptunaHub Benchmarks
The primary roles of OptunaHub Benchmarks fall into the following categories:
- Utilizing benchmark problems published on OptunaHub
- Registering benchmark problems to OptunaHub
Using OptunaHub Benchmarks requires the latest version of optunahub, v0.2.0. Please ensure that you have installed or upgraded optunahub before trying the contents of this article.
pip install optunahub==0.2.0 --upgrade
Utilization of Published Benchmark Problems
We now explain the usage of benchmark problems using the sample code we provided earlier. Note that coco-experiment is required to execute the sample code.
pip install coco-experiment # sample code's dependency
OptunaHub Benchmarks is registered under the benchmark category of the optunahub-registry
packages. Therefore, like other package categories, they can be easily loaded using the load_module
function.
In line 4 of the sample code in the introduction, we have used the "benchmarks/bbob"
package (hereafter referred to as the bbob
package). The `bbob` (blackbox optimization benchmarking) package provides 24 benchmark functions widely used in the research community. It is a wrapper implementation for Optuna of the original COCO (COmparing Continuous Optimizers) experiment library.
bbob = optunahub.load_module("benchmarks/bbob")
In line 5, the problem object for the 2-dimensional Sphere function is instantiated. Note that function_id=1
refers to the Sphere function in bbob
.
sphere2d = bbob.Problem(function_id=1, dimension=2, instance_id=1)
In lines 6–7, a study is created and optimization is performed. The problem object is properly equipped with a directions attribute to inform us of the optimization direction and the __call__(trial)
method, an objective function API for Optuna. They make our optimization script much simpler.
study = optuna.create_study(sampler=optuna.samplers.TPESampler(seed=42), directions=sphere2d.directions)
study.optimize(sphere2d, n_trials=20)
The bbob
package from OptunaHub Benchmarks can halve user codes in comparison to directly using COCO to define Optuna’s objective function on your own, as can be seen in Figure 1.
Figure 2 visualizes the 24 available benchmark functions in the bbob
package. Users can easily switch between them by passing the corresponding function_id
to Problem
. Additionally, the dimensions and the location of the optimal solution can be changed via dimension
and instance_id
. For more details, please refer to the documentation of the bbob
package.
In addition to the bbob package, OptunaHub Benchmarks currently includes bbob-constrained, which contains 54 constrained optimization problems, the WFG benchmark, a widely-used test suite for multi-objective optimization, and HPOBench, a collection of benchmark problems for hyperparameter optimization. These can also be used in the same way. Please feel free to try them out. Furthermore, we plan to continue expanding the available benchmarks in the future.
Can We Use OptunaHub Benchmarks for Other Optimization Frameworks? (General API)
The bbob.Problem
class provides two APIs:
__call__(trial)
, the Optuna-compatible API, andevaluate(params)
, a more general API that takes a dictionary with keys of each variable name and their corresponding parameter values.
Thanks to the general API, other optimization frameworks can easily ride on OptunaHub Benchmarks.
sphere2d.evaluate({"x0": 0, "x1": 0}) # You can evaluate the objective function for the given dictionary.
Here is an example of optimizing the 2-dimensional Sphere function using scipy.optimize.minimize instead of Optuna. An objective function is prepared using a simple lambda expression, which internally calls the evaluate
method. Furthermore, attributes provided by COCO, the original implementation in the BBOB, are also accessible from the bbob
package’s Problem
class. Here, for example, the initial_solution
, lower_bounds
, and upper_bounds
properties are used to obtain x0
and bounds
, respectively.
import scipy
result = scipy.optimize.minimize(
fun=lambda x: sphere2d.evaluate({f"x{d}": x[d] for d in range(sphere2d.dimension)}),
x0=sphere2d.initial_solution,
bounds=scipy.optimize.Bounds(
lb=sphere2d.lower_bounds, ub=sphere2d.upper_bounds
)
)
Registering Benchmark Problems to OptunaHub
OptunaHub Benchmarks not only provides benchmark problems but also enables developers to publish their benchmark problems as new packages. Here, we will explain how to create and publish benchmark problems.
A new problem can be simply created by inheriting optunahub.benchmarks.BaseProblem
and defining search_space,
directions
, and evaluate
.
search_space(self) -> dict[str, optuna.distributions.BaseDistribution]
: Implement a method that returns the search space consisting of variable names and distributions for the benchmark problem.directions(self) -> list[optuna.study.StudyDirection]
: Implement a method that returns a list of optimization directions. If it’s a minimization problem, return[optuna.study.StudyDirection.MINIMIZE]
. If it’s a maximization problem, return[optuna.study.StudyDirection.MAXIMIZE]
. For multi-objective optimization problems, a list of the directions for each objective function must be returned.evaluate(self, params: dict[str, float]) -> float | Sequence[float]
: Implement a method that takes a dictionary as an argument and returns the objective function values.
Below is an example implementation of the 2-dimensional Sphere function.
class Sphere2D(optunahub.benchmarks.BaseProblem):
@property
def search_space(self) -> dict[str, optuna.distributions.BaseDistribution]:
return {
"x0": optuna.distributions.FloatDistribution(low=-5, high=5),
"x1": optuna.distributions.FloatDistribution(low=-5, high=5),
}
@property
def directions(self) -> list[optuna.study.StudyDirection]:
return [optuna.study.StudyDirection.MINIMIZE]
def evaluate(self, params: dict[str, float]) -> float:
return params["x0"]**2 + params["x1"]**2
The above implements the minimal components for creating an original problem, but you can add any additional properties or methods as desired. For example, you can make the dimension of the problem configurable by defining an additional argument in the __init__()
method yourself (see SphereND
in the tutorial). The previously mentioned bbob
package is also implemented by inheriting optunahub.benchmarks.BaseProblem
, so if you are interested, please look at the implementation.
The benchmark problems you implement can be registered with OptunaHub by creating a pull request to the optunahub-registry repository as a package. For more details, please see the various tutorials available.
- How to Implement Your Benchmark Problems with OptunaHub (Basic)
- How to Implement Your Benchmark Problems with OptunaHub (Advanced: Conditional Parameters, Constrained Problems)
- How to Register Your Package with OptunaHub
Conclusion
In this article, we introduced OptunaHub Benchmarks, a new feature for benchmarks added in optunahub v0.2.0.
OptunaHub Benchmarks easily allows users to benchmark algorithms over various problems. Furthermore, OptunaHub Benchmarks is also convenient for preliminary experiments preceding real-world problems or for algorithm learning purposes. We hope OptunaHub Benchmarks makes your work more efficient.
Last but not least, we plan to expand the list of available benchmark packages, so we welcome any requests for specific benchmark problems or pull requests from benchmark developers to the optunahub-registry!
Appendix: Practical Benchmark Code Example
Here, we will introduce a more practical example of benchmark code, including visualization of results.
The experimental setups in this section are the following:
- Use 24 functions (
function_id
from 1 to 24) from BBOB. - Use function dimensionality of 2 (
dimension=2
) and instance ID of 1 (instance_id=1
). - Compare
TPESampler
andCmaEsSampler
. - Plot the mean and standard error of each sampler on each problem over 10 different random seeds.
Here is the code to generate Figure 3:
import itertools
import matplotlib.pyplot as plt
import numpy as np
import optuna
import optunahub
import pandas as pd
plt.rcParams["font.family"] = "Times New Roman"
plt.rcParams["mathtext.fontset"] = "stix" # The math font setup.
plt.rcParams["text.usetex"] = True
samplers = [optuna.samplers.TPESampler, optuna.samplers.CmaEsSampler]
def collect_results(
dimension: int = 2, instance_id: int = 1, n_seeds: int = 10, n_trials: int = 100
) -> pd.DataFrame:
bbob = optunahub.load_module("benchmarks/bbob")
results = []
# Compare TPESampler and CmaEsSampler using BBOB 24 problems over 10 random seeds.
for sampler_class, function_id, seed in itertools.product(
*(samplers, range(1, 25), range(n_seeds))
):
sampler = sampler_class(seed=seed)
objective = bbob.Problem(function_id, dimension, instance_id)
study = optuna.create_study(sampler=sampler, directions=objective.directions)
study.optimize(objective, n_trials=n_trials)
results.append(
{
"sampler_name": sampler_class.__name__,
"seed": seed,
"function_id": function_id,
"values": np.minimum.accumulate([t.value for t in study.trials]),
}
)
return pd.DataFrame(results)
def plot_results(df: pd.DataFrame, dimension: int = 2, instance_id: int = 1) -> None:
sampler_names = [sampler_class.__name__ for sampler_class in samplers]
n_seeds = len(df["seed"].unique())
x_axis = np.arange(len(df["values"].iloc[0])) + 1
fig, axes = plt.subplots(4, 6, figsize=(20, 10), sharex=True, tight_layout=True)
lines = [None] * len(sampler_names)
for sampler_name, function_id in itertools.product(*(sampler_names, range(1, 25))):
target_rows = df[(df["sampler_name"] == sampler_name) & (df["function_id"] == function_id)]
mean = np.mean([vs for vs in target_rows["values"]], axis=0)
sem = np.std([vs for vs in target_rows["values"]], axis=0) / np.sqrt(n_seeds)
ax = axes[(function_id - 1) // 6, (function_id - 1) % 6]
ax.set_title(f"{function_id=}")
lines[sampler_names.index(sampler_name)], = ax.plot(x_axis, mean, label=sampler_name)
ax.fill_between(x_axis, mean - sem, mean + sem, alpha=0.3)
ax.grid(True)
fig.suptitle(f"{instance_id=} in {dimension}D", fontsize=20)
fig.supxlabel("\# of Trials", fontsize=20)
fig.supylabel("Objective", fontsize=20)
fig.legend(handles=lines, labels=sampler_names, ncol=len(sampler_names), loc="upper right", fontsize=18)
plt.show()
df = collect_results(n_seeds=10, n_trials=100)
plot_results(df)
By running the code above, you can obtain a graph like Figure 3.
Real-world applications may require some tweaks or adjustments in the visualization and additional detailed analyses such as problem-specific discussions and statistical tests. We hope this example will be helpful as a reference for applications in your projects.