# An introduction to MOEA/D and examples of multi-objective optimization comparisons

# TL;DR

Optuna, widely used for solving optimization problems, has recently been updated to version 4, introducing a new feature called OptunaHub. Taking this opportunity, we have published an implementation of the multi-objective optimization algorithm “MOEA/D” through OptunaHub.

In this article, we introduce the characteristics of MOEA/D and present an example comparison with other optimization methods. We hope this article will be useful for those who are interested in evolutionary multi-objective optimization and those who are looking for more efficient optimization methods.

# Introduction

Hi to all Optuna users!

I’m Hiroaki Natsume, an engineer at NatureArchitects Inc.

In our company, we often use Optuna[1], a Python optimization framework, and Tunny[2], a plugin that uses Optuna for shape exploration in 3D CAD, when dealing with black-box optimization problems.

Optuna was recently updated to version 4. As part of this update, a new service called OptunaHub was released, which serves as a hub for publishing implementations that extend Optuna’s functionality.

Until now, Optuna has implemented many samplers that excel at single-objective optimization, but there weren’t many samplers specifically designed for multi-objective optimization, which our company frequently performs.

Therefore, to increase our options for multi-objective optimization, improve efficiency, and contribute to the Optuna community, we implemented the “MOEA/D” algorithm for multi-objective optimization and published it through OptunaHub.

https://hub.optuna.org/samplers/moead/

In this article, we will introduce the characteristics of the implemented MOEA/D and present an example comparison with multi-objective optimization results using other methods.

# Overview of MOEA/D

MOEA/D stands for “A Multiobjective Evolutionary Algorithm Based on Decomposition” and is a type of Evolutionary Multiobjective Optimization (EMO).

It is a frequently referenced method in the context of multi-objective optimization, and many advanced versions of this method have been proposed.

I will briefly introduce the algorithm based on the paper [3].

As the name suggests with “Decomposition,” MOEA/D is a method that aims to obtain a uniform Pareto front by decomposing a multi-objective optimization problem into multiple single-objective optimization problems.

Each decomposed optimization problem is called a subproblem.

When decomposing a multi-objective optimization problem, a scalarization function using weight vectors for each objective function is used.

Weight vectors are generated in the same number as the individuals in one generation, and in the first generation, each individual is randomly assigned one weight vector, becoming one subproblem.

While the paper presents the following three scalarization functions, only the first two are implemented in the version published on OptunaHub:

- Weight Sum
- Tchebycheff
- Penalty-based Boundary Intersection (PBI)

Among these, the Tchebycheff approach is the most commonly used scalarization function by default.

The calculation for the Tchebycheff approach is as follows:

In the case of a minimization problem, each subproblem is solved to minimize the Tchebycheff distance g weighted by the weight vector.

Here, λ is the weight vector, f(x) is the objective function, and z* is the reference point.

To create children in subproblems, individuals from the neighborhood of the subproblem are selected, and crossover and mutation are performed on them, similar to genetic algorithms (GA).

This part is implemented similarly to Optuna’s NSGA-II, allowing the use of various crossover methods already implemented in Optuna.

Here, the neighborhood of a subproblem refers to individuals belonging to subproblems with weight vectors that have close Euclidean distances to the weight vector assigned to the subproblem in question.

The parameter that determines how many individuals from neighboring subproblems to include as parent individuals is referred to as `T`

in the paper, and as the `n_neighbors`

argument in the OptunaHub implementation.

The larger `T`

is, the wider the range of parent individuals from which children are generated, leading to exploration over a broader area. Conversely, a smaller `T`

results in exploration over a narrower range. Therefore, this is an important variable that determines the performance of MOEA/D.

# Comparison

Let’s use Optuna to see what results we get when actually using these methods.

We used the following five methods for comparison. The same random seed value was used for each method.

For the comparison of results, we examined not only the final Pareto front shape but also the change in hypervolume and the optimization execution time.

# Objective Function

We used ZDT1[4], which is frequently used as a benchmark for multi-objective optimization, as our objective function.

This function is used for a bi-objective optimization problem with objective functions f1 and f2, represented by the following equations.

In this case, we used it as a 30-variable problem by setting n=30.

# Optimization Results

# Pareto Front Plot

The Pareto front plots resulting from the optimization are shown below.

Darker colored points represent results from later trials, and red points indicate individuals on the Pareto front.

Looking at the color intensity, TPE finds individuals on the Pareto front at an early stage.

However, after a certain number of trials, the distribution becomes similar to that of random sampling.

This can also be observed from the Hypervolume plot that will be presented later.

NSGA-II and NSGA-III are methods that discover better individuals by crossovering individuals on the Pareto front in each generation.

The results show a gradient of point intensity from top to bottom, indicating that the Pareto front gradually improves as generations progress.

In MOEA/D, since it solves subproblems using weighted vectors, unlike NSGA, we can observe a distribution of individuals corresponding to the weight vectors.

The figure below shows the plot of only the final Pareto front individuals for each method.

All methods significantly outperform the random approach.

NSGA-II and NSGA-III produced similar Pareto fronts.

In the settings of this example, MOEA/D achieved the best Pareto front.

# Hypervolume

The change in Hypervolume, with the number of trials on the horizontal axis, is as follows:

Hypervolume is one of the metrics used to evaluate multi-objective optimization.

The Hypervolume is the area (volume in 3D, hypervolume in 4D and above) of the figure formed by a specific reference point and the points on the Pareto front.

The larger this value, the better the indicator that a superior Pareto front has been obtained.

After a certain number of trials, all methods achieve a Hypervolume that surpasses random sampling.

TPE initially performs random sampling for the first 500 trials, then creates a surrogate model before using TPE for subsequent sampling.

As a result, we can see a sharp increase in Hypervolume around the 500-trial mark.

On the other hand, the Hypervolume plateaus around 2500 trials.

Regarding this plateau, while there may be room for improvement in the settings, it likely reflects a characteristic of Bayesian optimization, which is designed to explore good results within a limited number of trials rather than conducting an excessive number of trials.

Among the comparisons made in this study, MOEA/D consistently yields good Hypervolume results.

# Computation Time

The graph of computation time is as follows:

TPE shows a nonlinear increase in time with respect to the number of trials, while other methods exhibit a linear increase in time.

The graph on the right shows the computation time excluding TPE. NSGA and MOEA/D take more time than Random sampling, but they show almost the same trend.

When comparing the Hypervolume results and computation time, in this configuration, TPE achieves the largest Hypervolume per optimization time compared to other methods for about 1000 trials.

However, beyond that point, other methods yield better results.

This result aligns with the recommended budget (#trials) of 100–1000 for TPE as stated in the official Optuna documentation.

It’s worth noting that while the latter two figures excluding computation time are not implemented in Optuna itself, they use plots published on OptunaHub.

# Comparison in Higher Dimensions

In the previous section, we presented the results for 2 objective functions. In this section, we will examine the performance of MOEA/D for cases with a larger number of objective functions.

# Objective Function

For the objective functions, we will use an evaluation function called DTLZ1.

For this function, the Pareto front is a plane where the sum of the objective functions equals 0.5. For example, in the case of three objective functions, it looks like this:

# 3 Objectives with 30 Variables

First, we examine the results for the case with 3 objective functions with 30 variables, which is one more objective function than in the previous section’s example.

The maximum number of trials remains at 10,000, but unlike the 2 objective cases, we set the maximum optimization time to 3 minutes. This is because, as we learned from the 2 objective study, TPE’s Hypervolume tends to plateau after a certain time, so we set this limit to terminate the optimization before reaching 10,000 trials.

The Pareto front diagram of the optimization results is as follows:

The trend didn’t change significantly from the 2 objective function cases.

MOEA/D was able to obtain the best Pareto front. NSGAII and NSGAIII performed similarly to each other. A notable difference is that when comparing TPE and random search, TPE appears to have more scattered points and seems to perform worse.

The Hypervolume results were as follows.

From the Hypervolume, we can see that MOEA/D was able to search more efficiently compared to other methods. The reason why TPE has fewer trials compared to others is that the number of trials is determined by the 3-minute optimization time limit.

Based on these results, the apparent lower performance of TPE compared to random search in the Pareto front diagram can be attributed to TPE only being able to perform about 1/10 of the sampling compared to random search, thus not achieving a sufficient number of samples.

# 4 Objectives

For the 4 objective cases, we examined two patterns: one with 10 variables and another with 30 variables.

As it’s difficult to visualize the Pareto front in four dimensions, we’ll only present the Hypervolume results here.

The optimization results differed significantly depending on the number of variables. For the 10-variable case, all methods showed similar results without much difference. On the other hand, for the 30-variable case, the trend was similar to previous results, with MOEA/D achieving the best Hypervolume.

It’s worth noting that despite the difference in the number of variables, this problem should ultimately reach the same Hypervolume as they converge to the same objective function values. Therefore, we can see that in the 10-variable case, a Hypervolume comparable to what MOEA/D ultimately reached in the 30-variable case was achieved with fewer trials.

Below is an enlarged view of the first 1000 trials for the 10-variable case. When magnified, it becomes apparent that each optimization method achieves a larger Hypervolume compared to the Random method.

# Summary

In this article, we presented results that highlight the strengths of MOEA/D, which we implemented.

As the performance of each method varies depending on the problem and settings, we encourage you to experiment with different optimization approaches.

As mentioned earlier, the performance of MOEA/D is greatly influenced by the weight vectors. In the implementation published on OptunaHub, for dimensions of 3 or higher, we create weight vectors using quasi-Monte Carlo methods. As a result, the distribution is not always perfectly uniform. However, when examining the Hypervolume results in the comparison presented in this article, we confirmed that for the problems addressed in this example, sufficient performance was achieved even without a completely uniform distribution.

For those interested in evolutionary multi-objective optimization introduced in this article, we recommend referring to “進化型多目的最適化の現状と課題”[5], which provides a chronological overview of such multi-objective optimization methods.

MOEA/D is classified as an evolution-based algorithm within metaheuristic methods.

Other classifications include physics-based methods (e.g., simulated annealing) and swarm-based methods (e.g., particle swarm optimization).

As you can see, there are various optimization methods, each with its own strengths and weaknesses.

Most of these are implemented in existing optimization libraries[6],[7] or commercial software, so it’s often unnecessary to implement them yourself when using them.

However, there’s much to be learned from actually implementing and comparing their behaviors hands-on.

We encourage you to not just use these methods, but also to challenge yourself with such implementation attempts.

Note: The implementation of MOEA/D is published on OptunaHub under the MIT license. Please make sure you understand the license before using it.

# Reference

[1]：Akiba, Takuya and Sano, Shotaro et al., Optuna: A Next-generation Hyperparameter Optimization Framework, The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631, 2019

[2]：Natsume, H. (2024). Tunny, The Grasshopper optimization tool (Version 0.12.0) [Computer software]. https://github.com/hrntsm/Tunny

[3]：Q. Zhang and H. Li, MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition, in IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, Dec. 2007

[4] : Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele. Comparison of multiobjective evolutionary algorithms: empirical results. *Evolutionary Computation*, 8(2):173–195, 2000

[5]：佐藤寛之, 石渕久生, “進化型多数目的最適化の現状と課題”, オペレーションズリサーチ2017年3月号

[6]：J. Blank and K. Deb, pymoo: Multi-Objective Optimization in Python, in IEEE Access, vol. 8, pp. 89497–89509, 2020

[7]：Juan J. Durillo, Antonio J. Nebro, jMetal: A Java framework for multi-objective optimization, Advances in Engineering Software, Volume 42, Issue 10, 760–771, 2011,

# Supplemental

For reference, the execution code for the comparison in the case of bi-objective functions is shown below.

`import numpy as np`

import optuna

import optuna.storages.journal

import optunahub

def objective(trial: optuna.Trial) -> tuple[float, float]:

# ZDT1

n_variables = 30

x = np.array([trial.suggest_float(f"x{i}", 0, 1) for i in range(n_variables)])

g = 1 + 9 * np.sum(x[1:]) / (n_variables - 1)

f1 = x[0]

f2 = g * (1 - (f1 / g) ** 0.5)

return f1, f2

def tpe_gamma(x: int) -> int:

return min(int(np.ceil(0.1 * x)), 50)

mod_sampler = optunahub.load_module("samplers/moead")

seed = 42

population_size = 100

n_trials = 10000

crossover = optuna.samplers.nsgaii.BLXAlphaCrossover()

samplers = [

optuna.samplers.RandomSampler(seed=seed),

optuna.samplers.NSGAIISampler(

seed=seed,

population_size=population_size,

crossover=crossover,

),

optuna.samplers.NSGAIIISampler(

seed=seed,

population_size=population_size,

crossover=crossover,

),

mod_sampler.MOEADSampler(

seed=seed,

population_size=population_size,

n_neighbors=population_size // 5,

scalar_aggregation_func="tchebycheff",

crossover=crossover,

),

optuna.samplers.TPESampler(

seed=seed,

n_startup_trials=500,

n_ei_candidates=50,

multivariate=True,

gamma=tpe_gamma,

),

]

studies = []

for sampler in samplers:

study_name = f"{sampler.__class__.__name__}"

study = optuna.create_study(

sampler=sampler,

study_name=study_name,

directions=["minimize", "minimize"]

)

study.optimize(objective, n_trials=n_trials)

studies.append(study)

optuna.visualization.plot_pareto_front(study).show()

mod_pfm = optunahub.load_module("visualization/plot_pareto_front_multi")

fig = mod_pfm.plot_pareto_front(studies)

fig.show()

reference_point = [1.0, 7.5]

mod_vhm = optunahub.load_module("visualization/plot_hypervolume_history_multi")

fig = mod_vhm.plot_hypervolume_history(studies, reference_point)

fig.show()

Please resolve the dependencies as follows:

`pip install optuna optunahub scipy plotly`