Introduction to CMA-ES Based Sampler

Published in

Optuna

5 min readMar 10, 2021

Hi, I’m @c-bata, an author of cmaes library. Optuna uses a univariate TPE for single objective optimization by default, but you can easily switch to other samplers.

In this article, I introduce CmaEsSampler. It may perform better than Optuna’s default sampler. After reading this article, you will know the followings:

What is CMA-ES?
How to use CmaEsSampler.
The variants of CMA-ES.
When you should NOT use CmaEsSampler.

What is CMA-ES?

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [1] is one of the most promising methods for black-box optimization, where objective functions cannot be described using an explicit representation in general. CMA-ES has shown the best performance out of over 100 black-box optimization methods for various benchmark problems [2].

CMA-ES samples solutions (i.e., solutions correspond to hyperparameters if you perform hyperparameter optimization) from a multivariate gaussian distribution. After evaluating all solutions, the solutions are sorted by evaluation values, then updating the distribution parameters (i.e., the mean vector and the covariance matrix) based on the ranking of evaluation values. The following animation explains the behavior of CMA-ES.

Left, the optimal solutions (yellow stars) and the solutions sampled by CMA-ES (red points); Right, the update process of the multivariate gaussian distribution.

CmaEsSampler is added at v1.3.0 and stabled at v2.0.0. This sampler uses cmaes under the hood. The usage is like this:

The only thing you need to know when using CmaEsSampler is to pass the object via sampler argument. I think this is enough for almost objective functions, but CmaEsSampler provides a lot of options for more efficient optimization.

Warm Starting CMA-ES

Warm Starting CMA-ES (WS-CMA-ES) [3] significantly improves the optimization performance by transferring the optimization results of similar HPO tasks as prior knowledge. This algorithm is proposed by @nmasahiro, a co-maintainer of cmaes library, and accepted at AAAI 2021.

Warm Starting CMA-ES is a very powerful technique when you are in the following situations:

You have an optimization history of your objective functions with a subset of the dataset (e.g. a history of training your model with 10% of a full dataset).
You have an optimization history of your objective functions with different dataset.
Your system runs hyperparameter optimization weekly or monthly. In this case, you can use the previous optimization history to perform optimization for the next week/month.

Here is the result of an experiment to optimize hyperparameters of LightGBM for Kaggle’s Toxic Comment Classification Challenge data.

An experiment with warm starting optimization using a result of the HPO for a subset of the dataset. The horizontal axis represents the number of evaluations. Source code is available at GitHub (URL: https://github.com/c-bata/benchmark-warm-starting-cmaes)

In this experiment, we use 10% of the full dataset as the source task. The usage of Warm Starting CMA-ES is like this:

Please see the line to generate a sampler object with `source_trials` option. To transfer the similar HPO tasks, you just pass the list of FrozenTrials of the source task.

Restarting Strategy for CMA-ES

IPOP-CMA-ES (“IPOP” is a short of “Increasing POPulation size”) [4] is a method to restart CMA-ES with increasing population size when CMA-ES converges to a local minimum. By increasing the population size, the search characteristic becomes more global after each restart.

CMA-ES is restarted when converges to a local minimum.

This algorithm is useful when there are a lot of evaluation budgets and expected to have multiple local minimums. The usage is like this:

To use IPOP-CMA-ES, you need to set restart_strategy=”ipop”. There is an additional parameter inc_popsize (default: 2) which is a multiplier for increasing population size before each restart. According to the paper, it reveals similar performance for factors between 2 and 3.

Separable CMA-ES

Separable CMA-ES (sep-CMA-ES) [5] is an algorithm which constrains the covariance matrix to be diagonal. Because the model complexity is reduced, the learning rate for the covariance matrix can be increased. Consequently, this algorithm outperforms CMA-ES if hyperparameters are not correlated. Here is the benchmark result on the six-hump camel function.

As this benchmark result shows, sep-CMA-ES outperforms CMA-ES algorithm especially on a small budget. The usage is like this:

To use sep-CMA-ES, you need to enable a use_separable_cma option. Please note that it is prohibited to use a solution_trials option together since CmaEsSampler currently does not support a warm starting method for sep-CMA-ES.

When you should NOT use CmaEsSampler.

With the moderate evaluation budget, CMA-ES achieves attractive performance for continuous optimization. However, CmaEsSampler does not necessarily outperform bayesian optimization methods in the following cases:

Categorical parameter: CmaEsSampler does not support categorical parameters. So I recommend you to use TPESampler or SkoptSampler instead if your search space contains categorical parameters.
Small evaluation budget: CMA-ES is particularly useful when the evaluation budget is moderate to large (> 100 x the number of hyperparameters) [6]. On the other hand, I cannot recommend you to use CMA-ES in a small evaluation budget.
Distributed optimization: The concurrency model of Optuna is an asynchronous parallelization. In this model, CmaEsSampler cannot use some trials for updating the parameters of multivariate normal distribution. However, it is able to overcome this problem by ask-and-tell interface although you need to be familiar with the design of Optuna and CmaEsSampler.
Dynamic search space: CmaEsSampler does not support dynamic search space (e.g. examples/sklearn_simple.py). Therefore the parameters are sampled by independent_sampler (default: Random search).

Conclusion & Future Work

In this article, I introduced CMA-ES, the variants, how to use CmaEsSampler in Optuna, and when you should NOT use CmaEsSampler. As a future work, I’m planning to support the following algorithms.

WS-sep-CMA-ES [3], which applies a warm starting method for sep-CMA-ES.
DTS-CMA-ES [7], which uses Gaussian Process as a surrogate model.
BIPOP-CMA-ES [2], which applies two interlaced restart strategies, one with an increasing population size and one with varying small population sizes.

Finally, the entire code of CmaEsSampler can be found here. I hope you will dive into and open pull requests for more improvements.

References

[1] N. Hansen. The CMA Evolution Strategy: A Tutorial, arXiv:1604.00772, 2016.
[2] I. Loshchilov, M. Schoenauer, and M. Sebag. Bi-population CMA-ES Algorithms with Surrogate Models and Line Searches, GECCO Workshop, 2013.
[3] M. Nomura, S. Watanabe, Y. Akimoto, Y. Ozaki, M. Onishi. Warm Starting CMA-ES for Hyperparameter Optimization, AAAI, 2021.
[4] A. Auger, N. Hansen. A restart CMA evolution strategy with increasing population size, CEC, 2005.
[5] R. Ros, N. Hansen. A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity, PPSN, 2008.
[6] F. Hutter, H. Hoos, and K. Leyton-Brown, An evaluation of sequential model-based optimization for expensive blackbox functions, GECCO Workshop, 2013.
[7] L. Bajer, Z. Pitra, J. Repicky, and M. Holeňa. Gaussian process surrogate models for the CMA evolution strategy. Evolutionary computation, 2019.