CMA-ES with Margin: New Variant of CMA-ES for Mixed-Integer Black-Box Optimization Introduced in Optuna v3.1.0
This article introduces CMA-ES with Margin, which is implemented in Optuna v3.1.0. The CMA-ES with Margin  is a variation of CMA-ES for mixed-integer black-box optimization (MI-BBO), where the search space contains both continuous and integer variables, such as hyperparameter optimization.
In this article, the following is mentioned to explain the CMA-ES with Margin:
- Introduction of the original CMA-ES.
- Challenges in adapting CMA-ES to MI-BBO.
- Main ideas of CMA-ES with Margin.
- Advantages of CMA-ES with Margin over Other Methods.
What is CMA-ES in the first place?
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)  is one of the promising methods for continuous black-box optimization, where the objective function is not given in the analytical form. The CMA-ES proceeds with the optimization in the following steps:
- Sample several solutions (continuous vectors) from a multivariate Gaussian distribution (MGD)
- Evaluate the solutions with the objective function
- Update the distribution parameters of MGD based on the ranking of solutions
- Repeat steps 1 to 3
The CMA-ES has two attractive properties for users. First, the CMA-ES has several invariance properties, such as the invariance of a strictly monotonic transformation of the objective function and an affine transformation of the search space. Second, the CMA-ES is a quasi-parameter-free algorithm, which allows users to use it without tuning the hyperparameters. The CMA-ES has been used in many real-world applications.
Applying CMA-ES to Mixed-Integer Black-Box Optimization
The CMA-ES is often applied to mixed-integer black-box optimization (MI-BBO), including hyperparameter optimization . Since the CMA-ES is a method for continuous black-box optimization, it is necessary to discretize part of the sampled continuous vector in order to adapt it to MI-BBO. However, because of the plateau caused by the discretization, the integer variables of candidate solutions are fixed to a single integer variable. The following animation shows the behavior when the CMA-ES minimizes f(x) = x_0^2 + x_1^2, where x_0 is a continuous variable and x_1 is an integer variable.
The integer variable x_1 is obtained by rounding the continuous variable sampled from the MGD. The dashed ellipse represents the confidence ellipse of the MGD. The red star indicates the optimal solution position. We observe that the value of x_1 for a candidate solution sampled from the MGD is fixed at 3. This fixation occurs if the variance of the marginal distribution is smaller than the width of the integer variable. When the fixation occurs in the early stage of optimization, it becomes difficult to improve the evaluation value.
CMA-ES with Margin
CMA-ES with Margin (CMA-ESwM)  is a CMA-ES variant recently proposed for MI-BBO. The CMA-ESwM introduces a lower bound on the marginal probability, called the margin, so that the sample is not fixed to a single integer variable. The margin is a common technique in the estimation of distribution algorithms (EDAs) for binary domains to address the problem of bits being fixed to 0 or 1. The following is a description of how to incorporate margin ideas into the CMA-ES.
As shown in the figure above, we consider the case of discretizing a continuous variable sampled from the MGD into a binary variable. In the left side of the figure, we can see that the probability of a continuous variable being sampled in the region to be discretized to 0 is high, and the resulting integer variable is fixed at 0, i.e., the integer variable 1 is rarely sampled. The MGD is slightly modified to eliminate the fixation, as shown on the right side of the figure. Then, the correction increases the probability that the integer variable 1 is sampled, thereby eliminating the fixation. This idea can also be applied to the fixation of integer variables, but the details are omitted in this article.
The following animation shows the behavior when the CMA-ESwM minimizes f(x) = x_0^2 + x_1^2, where x_0 is a continuous variable and x_1 is an integer variable.
Introducing the margin correction keeps the lower bound on the variance of the MGD so that multiple integer variables are more likely to be sampled. As a result, the margin correction prevents the MGD from converging in regions outside the optimal solution.
The strength of the margin correction is controlled by a single hyperparameter. The closer the hyperparameter is to zero, the closer the behavior of CMA-ESwM is to that of the original CMA-ES. The sensitivity of this hyperparameter has been investigated in , and recommended values have been obtained experimentally.
Advantages of CMA-ES with Margin over Other Methods
If you consider applying CMA-ES to MI-BBO, such as hyperparameter optimization, I recommend using CMA-ESwM instead. In particular, the more significant the proportion of integer variables, the greater the advantage of CMA-ESwM over CMA-ES. You can see a comparison between CMA-ES and CMA-ESwM using HPO-bench problems here. Additionally, a comparison with other black-box optimization methods is presented in , where it is shown that the CMA-ESwM outperforms the other methods at higher dimensions.
How to use CMA-ES with Margin in Optuna
In Optuna, you can easily use CMA-ES with Margin just by specifying the
with_margin argument of
CmaEsSampler as follows:
from optuna.samplers import CmaEsSampler
... # Include discrete search space.
study = optuna.create_study(sampler=CmaEsSampler(with_margin=True))
This article introduced CMA-ES with Margin, which is implemented in Optuna v3.1.0. CMA-ES with Margin is an extension of CMA-ES to mixed-integer black-box optimization and solves the problem of stalled search due to discrete variables. CMA-ES with Margin is recommended to use instead of CMA-ES when the search space contains integer variables.
 Ryoki Hamano, Shota Saito, Masahiro Nomura, and Shinichi Shirakawa. CMA-ES with Margin: Lower-Bounding Marginal Probability for Mixed-Integer Black-Box Optimization. In Genetic and Evolutionary Computation Conference (GECCO ’22). 2022. [arXiv]
 Nikolaus Hansen. The CMA Evolution Strategy: A Tutorial. arXiv:1604.00772. 2016.
 Ilya Loshchilov and Frank Hutter. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. In International Conference on Learning Representations (ICLR 2016). Workshop Track. 2016.
 Ryoki Hamano, Shota Saito, Masahiro Nomura, and Shinichi Shirakawa. Benchmarking CMA-ES with Margin on the bbob-mixint Testbed. In Genetic and Evolutionary Computation Conference Companion (GECCO ’22 Companion). 2022.