Published in


Introduction to CMA-ES sampler.

Hi, I’m @c-bata, an author of cmaes library. Optuna uses a univariate TPE for single objective optimization by default, but you can easily switch to other samplers.

In this article, I introduce CmaEsSampler. It may perform better than Optuna’s default sampler. After reading this article, you will know the followings:

  • What is CMA-ES?

What is CMA-ES?

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [1] is one of the most promising methods for black-box optimization, where objective functions cannot be described using an explicit representation in general. CMA-ES has shown the best performance out of over 100 black-box optimization methods for various benchmark problems [2].

CMA-ES samples solutions (i.e., solutions correspond to hyperparameters if you perform hyperparameter optimization) from a multivariate gaussian distribution. After evaluating all solutions, the solutions are sorted by evaluation values, then updating the distribution parameters (i.e., the mean vector and the covariance matrix) based on the ranking of evaluation values. The following animation explains the behavior of CMA-ES.

Left, the optimal solutions (yellow stars) and the solutions sampled by CMA-ES (red points); Right, the update process of the multivariate gaussian distribution.

CmaEsSampler is added at v1.3.0 and stabled at v2.0.0. This sampler uses cmaes under the hood. The usage is like this:

The only thing you need to know when using CmaEsSampler is to pass the object via sampler argument. I think this is enough for almost objective functions, but CmaEsSampler provides a lot of options for more efficient optimization.

Warm Starting CMA-ES

Warm Starting CMA-ES (WS-CMA-ES) [3] significantly improves the optimization performance by transferring the optimization results of similar HPO tasks as prior knowledge. This algorithm is proposed by @nmasahiro, a co-maintainer of cmaes library, and accepted at AAAI 2021.

Warm Starting CMA-ES is a very powerful technique when you are in the following situations:

  • You have an optimization history of your objective functions with a subset of the dataset (e.g. a history of training your model with 10% of a full dataset).

Here is the result of an experiment to optimize hyperparameters of LightGBM for Kaggle’s Toxic Comment Classification Challenge data.

An experiment with warm starting optimization using a result of the HPO for a subset of the dataset. The horizontal axis represents the number of evaluations. Source code is available at GitHub (URL: https://github.com/c-bata/benchmark-warm-starting-cmaes)

In this experiment, we use 10% of the full dataset as the source task. The usage of Warm Starting CMA-ES is like this:

Please see the line to generate a sampler object with `source_trials` option. To transfer the similar HPO tasks, you just pass the list of FrozenTrials of the source task.

Restarting Strategy for CMA-ES

IPOP-CMA-ES (“IPOP” is a short of “Increasing POPulation size”) [4] is a method to restart CMA-ES with increasing population size when CMA-ES converges to a local minimum. By increasing the population size, the search characteristic becomes more global after each restart.

CMA-ES is restarted when converges to a local minimum.

This algorithm is useful when there are a lot of evaluation budgets and expected to have multiple local minimums. The usage is like this:

To use IPOP-CMA-ES, you need to set restart_strategy=”ipop”. There is an additional parameter inc_popsize (default: 2) which is a multiplier for increasing population size before each restart. According to the paper, it reveals similar performance for factors between 2 and 3.

Separable CMA-ES

Separable CMA-ES (sep-CMA-ES) [5] is an algorithm which constrains the covariance matrix to be diagonal. Because the model complexity is reduced, the learning rate for the covariance matrix can be increased. Consequently, this algorithm outperforms CMA-ES if hyperparameters are not correlated. Here is the benchmark result on the six-hump camel function.

As this benchmark result shows, sep-CMA-ES outperforms CMA-ES algorithm especially on a small budget. The usage is like this:

To use sep-CMA-ES, you need to enable a use_separable_cma option. Please note that it is prohibited to use a solution_trials option together since CmaEsSampler currently does not support a warm starting method for sep-CMA-ES.

When you should NOT use CmaEsSampler.

With the moderate evaluation budget, CMA-ES achieves attractive performance for continuous optimization. However, CmaEsSampler does not necessarily outperform bayesian optimization methods in the following cases:

  • Categorical parameter: CmaEsSampler does not support categorical parameters. So I recommend you to use TPESampler or SkoptSampler instead if your search space contains categorical parameters.

Conclusion & Future Work

In this article, I introduced CMA-ES, the variants, how to use CmaEsSampler in Optuna, and when you should NOT use CmaEsSampler. As a future work, I’m planning to support the following algorithms.

  • WS-sep-CMA-ES [3], which applies a warm starting method for sep-CMA-ES.

Finally, the entire code of CmaEsSampler can be found here. I hope you will dive into and open pull requests for more improvements.


  • [1] N. Hansen. The CMA Evolution Strategy: A Tutorial, arXiv:1604.00772, 2016.



A hyperparameter optimization framework

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

Creator of go-prompt and kube-prompt. Optuna core-dev. Kubeflow/Katib reviewer. GitHub: c-bata