Metaheuristic optimization with the Differential Evolution algorithm

Francesco Cannarile
Eni digiTALKS
Published in
9 min readAug 30, 2022

Learn the theory of the Differential Evolution algorithm, its Python implementation and how and why it will surely help you in solving complex real-world optimization problems.

This article has been written with Salvatore Guastella.

Optimization is a pillar of data science. No doubt about that.

If you think about it, under the hood of each machine learning algorithms (ranging from basic linear regression to the most complex neural networks architectures), an optimization problem is solved.

Moreover, in many real-world problems the goal is to find the values of one or more decision variables that minimize (or maximize) a quantity of interest while satisfying certain constraints. Few examples are given by portfolio optimization in finance, profit maximization of ad campaigns, energy efficiency in energy plants and shipment cost minimization in logistics (refer to this Medium article [1] in our Eni digiTALKS channel for an interesting example).

This article focuses on solving non-trivial optimization problems where classical gradient-based optimization methods cannot be applied. An approach to solve those kinds of problems is via the metaheuristic Differential Evolution (DE) algorithm.

First, we will focus on the theory underlying this algorithm and then on a practical example in Python.

Differential Evolution

The Differential Evolution (DE) algorithm belongs to the class of evolutionary algorithms and was originally proposed by Storn and Price in 1997 [2]. As the name suggests, it is a bio-inspired algorithm which relies on Darwinian evolution concepts such as mutation, crossover (or recombination) and selection. DE solves an optimization problem by iteratively attempting to improve a set of candidate solutions with respect to a given objective function to be minimized (or maximized). DE is one possible choice in the realm of the optimization algorithms [3] [4], but what are its main features and peculiarities?

  • Ability to handle non-differentiable, non-linear, and multimodal cost functions. This is a key point, in fact gradient-based and quasi-Newton methods requires differentiable and continuous functions which are typically rare in real-world problems. DE uses a stochastic direct search that does not require any information about the gradient;
  • ability to solve Single Objective (SO) and Multi Objective (MO) optimization problems (eventually constrained) with both continuous and discrete decision variables;
  • easy parallelization: useful when dealing with objective functions which are computationally intensive to evaluate;
  • few hyperparameters to tune (in its basic version, these are only three).

How does DE work? At each iteration (also known as generation) after an initialization step, the following steps are performed:

  • mutation;
  • crossover (or recombination);
  • selection;
  • stopping criterion evaluation.

Do not be afraid of this biological mess, we will go into the details of these steps in the next paragraphs!

Slight differences in the mutation and crossover steps define different strategies for the generation of a population of candidate solutions (for more details, the interested reader can refer to [2]); for the purpose of this article, we consider the best1bin strategy.

Without loss of generality, we focus on a SO minimization problem with continuous decision variables u∈ ℝⁿ

where F: → ℝ is a cost function (or objective function) that we seek to minimize. We do not make any assumptions on F (finally, we are free of assumptions!) except that we can evaluate it for any vector of decision variables u.

At the pᵗʰ generation, the DE algorithm proposes a population of NP candidate solutions (also known as target vectors) Uₚ=[uₚ¹,…, uₚᶨ,…,uₚᴺᴾ] where each uᶨ, j∈{1,…,NP} is a vector of n decision variables whose kᵗʰ entry is denoted with uᶨₖ,k∈{1,…,n}.

DE basic strategy at the pᵗʰ generation can be described as follows.

Initialization

Only for the first generation, the population U₁ is randomly generated or, if any prior knowledge of the problem is available, target vectors can be manually initialized.

Mutation

In this step for each target vector u, a mutant vector m is generated using a linear transformation of the best vector uₚᵇᵉˢᵗ (from which the term best of the best1bin strategy), i.e., that vector in the current population which reaches the lowest value of the cost function in Eq. (1) among all the candidate solutions. The mutant candidate mₚᶨ is obtained adding to uₚᵇᵉˢᵗ the difference (from here the name differential evolution) between two randomly selected candidate solutions uₚʳ¹ and uₚʳ² from the current population (which differs from the one that is currently mutating) multiplied by the mutation constant M>0.

Crossover

To increase the diversity of the mutant vectors, crossover is introduced. For each target vector, a trial vector zₚᶨ is formed based on Eq. (3)

where:

  • rₖ,ₚ is a random number sampled from a uniform distribution in the interval [0,1];
  • kₚ,ᵣ is a random index in {1,…,n} sampled from a binomial distribution (the bin of best1bin stands for it) that guarantees that at least one decision variable will originate from the mutant vector;
  • CR is the crossover parameter of the DE algorithm. Notice that rₖ,ₚ and kₚ,ᵣ are resampled at each generation p.

Selection

In the selection step, the new population is formed by comparing each target vector with its corresponding trial vector with respect to the objective function F. In detail, at the (p+1)ᵗʰ generation the new jᵗʰ candidate solution is given by

i.e., if zₚᶨ is a better solution, it will replace uₚᶨ.

Stopping criterion evaluation

The algorithm is stopped if any of the following two conditions is met:

  • the maximum number of generations is reached;
  • the standard deviation of the objective function values of the candidate solutions is less than a defined threshold (tol). That means that the new target vectors do not improve anymore the objective function and a minimum is reached.

Finally, Figure 1 pictorially shows the best1bin DE algorithm.

Figure 1 — The Differential evolution algorithm.

Okay, enough theory… let’s see a practical example!

Differential Evolution in action: a toy example

To see the DE algorithm in action, a toy example is here shown based on the Van Der Pol Oscillator [5].

The goal is to find the control function u(t) hat minimizes an objective function F(u) in a dynamic environment governed by a system of three differential equations with state variables x₁, x₂ and x₃. The control function can take values in the interval [-0.3, 1] whereas the values of x₁, x₂ and x₃ are constrained to assume fixed value at starting and ending times t₀ and tN, respectively.

Figure 2: Van-der-Pol control problem.

Now, you can wonder…how can DE come into play? We focus on a discretized formulation of our control problem: we consider equally spaced times [t₁,…tₖ,…,tN] with uₖ=u(tₖ). Our optimization problem reduces to that reported in Figure 3 … but the cost function F is not differentiable… no problem! We can easily solve it with our DE algorithm!

Figure 3: Discretized Van-der-Pol control problem.

We are ready to solve our optimization problem in Python! DE is implemented inside the optimize module of the scipy library. More details can be found in the official documentation [6]. Let’s start by importing the functions differential_evolution and odeint which is used to numerically solve our system of ordinary differential equation [7].

Now, let’s define our array of equispaced timesteps with fixed step delta_T, and final time final_T.

Then, we need to define a function such that, for any timestep t, returns the corresponding value of the control function u(t) (represented as the array u_val):

The Van Der Pol system is defined as a function of the array x(that contains x₁, x₂ and x₃), the timestep t, the control function (u_function)and returns a list of three elements.

objective_functiontakes as argument the variables u_valand time_vector. The first line of the code below defines the initial condition x0, then, the odeint function is used to numerically solve the Van Der Pol system of differential equations. odeint takes as arguments the system to be solved van_der_pol, the initial condition x0 and the array of timesteps time_vector.

In the snippet below, we first define the bounds of our decision variables uₖ. Finally, we are ready to run the differential_evolution function itself! Let’s give a look at its argument:

  • bounds: a list of (min, max) bounds for the decision variables;
  • func: the objective function to be minimized;
  • maxiter: maximum number of generations (default value is 1000);
  • posize: a multiplier for the total population size such that NP = N * pos_size(default value is 15);
  • workers: number of CPU cores working in parallel (default value is 1; -1 uses all the available cores);
  • disp: a Boolean variable that prints the value of the cost function at each iteration if True (default is False).

differential_evolution returns a scipy.optimize.OptimizeResult object [7] that has many attributes including:

  • x: the solution of the optimization problem;
  • fun: the value of the objective function at the optimal solution;
  • success: a Boolean that indicates if the optimization ended successfully or not;
  • message: a description of the cause of termination;
  • nit: the number of iterations of the optimizer.

After running the lines of code above, let’s extract and print our solution (Figure 4). By qualitatively comparing this result with that of the benchmark solution in Figure 5, we can see that they have nearly similar trend, with the latter being smoother. Better results can be obtained by simply reducing the step delta_Tand by tuning , for example, the mutant and crossover parameters M and CR, respectively (you can try it!). We can also see that each uₖ correctly lies in the interval [-0.3, 1]. Great job, well done!

Figure 4 — Discretized control function u(t) obtained solving the Van Der Pol problem with the Differential Evolution algorithm and the best1bin strategy.
Figure 5 — Benchmark solution u(t) of the Van Der Pol control problem.
Source: http://tomdyn.com/examples/vanDerPol.html

Wrap Up

We have seen how you can solve complex optimization problems with the differential evolution algorithm. It naturally manages non-differentiable and non-continuous functions which is the case in many real-world problems, overcoming the limitations of gradient-based and quasi-Newton methods. Differential evolution, in fact, is a metaheuristic method and does not require the calculation of the gradient of the cost function.

We also covered the theory of the algorithm and its main steps inspired from Darwin biological evolution theory. The Python implementation inside the scipy library makes DE easy to setup, configure and use as we saw in the Van Der Pol example.

Thanks for reading, we hope you enjoyed the article and consider using DE when dealing with complex optimization problems!

--

--

Francesco Cannarile
Eni digiTALKS

Machine learning enthusiast! I have 8+ years of data science experience in developing data-driven solutions for industrial and R&D applications