DEAP, a Python evolutionary computation framework.

7 min readNov 4, 2019

Distributed Evolutionary Algorithms in Python (DEAP) is described as an evolutionary computation framework for rapid prototyping and testing of ideas [1]. It incorporates tools and data structures to easy implement genetic algorithms, genetic programmings, evolution strategies, and particle swarm optimization. It is developed at Université Laval since 2009.

The easiest way to install it is through pip:

pip install deap

Once it is installed, building an EA is very straightforward.

EA basics

An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and survival selection. Possible solutions to the optimization problem are considered the individuals in a population, and their quality is determined by a fitness function. Sequential application of the operators drives the evolution of the population. In other words this can be written as:

Step One: Generate the initial population of individuals randomly. (First generation)
Step Two: Evaluate the fitness of each individual in that population (time limit, sufficient fitness achieved, etc.)
Step Three: Repeat the following regenerational steps until termination:
a: Select the best-fit individuals for reproduction. (Parents)
b: Breed new individuals through crossover and mutation operations to give birth to offspring.
c: Evaluate the individual fitness of new individuals.
d: Replace least-fit population with new individuals.

DEAP implementation

Four important steps are necessary before running the program: the type of the individuals must be decided, to encode the problem in the smartest way. How to initialise the individual, and how many of them we want to apply is another important decision we have to take. How to combine them is very important, a wrong operator can block evolution at all, wasting a lot of time. Finally, the type of algorithm we want to use has to be decided. All these steps are explained in the following lines.

Types

Before to generate the initial population, the type of the individual must be defined. For example, if we want to codify the individual with float values, we just create it with the “list” tag and that is it. A fitness function has to be added, therefore the FitnessMin class is perfect for a minimization problem. A minimizing fitness is built using negatives weights, while a maximizing fitness has positive weights. Other types are available to be implemented and a list with examples is available at [2].

from deap import base, creator
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

Initialization

How to implement Step One? DEAP provides a useful container called Toolbox exactly for this task. It is used to create the initializers for a population and its individuals (implemented with random floating-point numbers, as shown before).

import random
from deap import tools

INDIVIDUAL_SIZE = 50

toolbox = base.Toolbox()
toolbox.register("attribute", random.random)
toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attribute, n=INDIVIDUAL_SIZE)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

The Attribute registration to the toolbox defines how the individuals are initialised, “random” in this case. This is passed to the individual registration together with a number to define the number of individuals to generate. The individual registration is then specified into the population registration in order to be able to generate it through the following line:

toolbox.population()

Operators

Another important step to define is the Operators to use in Step Three B. They are going to drive the evolution, therefore attention must be paid to define them correctly. DEAP already implements several of them, making our life a little bit easier. Their declaration is very similar to the way we used to declare individuals.

In the next example, the mate operator executes a uniform crossover that modifies in place the two sequence individuals. The mutation operator applies a gaussian mutation of mean mu and standard deviation sigma on the input individual. The indpb argument is the probability of each attribute to be mutated. The parent selection is done by selecting the best individual among tournsize randomly chosen individuals, k times.

toolbox.register("mate", tools.cxUniform)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=5)

There is a great variety of mutation operators in the deap.tools module. If you want to implement your own operator, remember that they only mutate, therefore an independent copy must be made prior to mutating the individual.

mutant = toolbox.clone(ind1)
ind2, = tools.mutGaussian(mutant, mu=0.0, sigma=0.2, indpb=0.2)
del mutant.fitness.values

There is also a big variety of crossover operators implemented in the deap.tools module. Be aware to select the right operator for the representation chosen. If you want to implement your own operator, remember that they only mate individuals, therefore an independent copy must be made prior to mate the individual.

child1, child2 = [toolbox.clone(ind) for ind in (ind1, ind2)]
tools.cxTwoPoint(child1, child2)
del child1.fitness.values
del child2.fitness.values

Selection is made among a population by the selection operators that are available in the deap.tools module. The selection operator usually takes as first argument a container of individuals and the number of individuals to select.

Algorithm

There are two ways to write the main loop of the evolutionary algorithm. Either we write our own implementation or we use one provided by thealgorithms module. The following algorithms are available:

eaSimple: This algorithm reproduce the simplest evolutionary algorithm as presented in chapter 7 of [3].
eaMuPlusLambda: This is the “mu+lambda” evolutionary algorithm.
eaMuCommaLambda: This is the “mu,lambda” evolutionary algorithm.
eaGenerateUpdate: This is algorithm implements the ask-tell model proposed in [4], where ask is called generate and tell is called update.
different Variations: Variations are smaller parts of the algorithms that can be used separately to build more complex algorithms.

The algorithms are very easy to extend through subclassing in order to add new features that are problem-dependent. For example, in the following algorithm, everything discussed so far is presented.

def main():
    pop = toolbox.population(n=100)
    mate_probability = 0.5
    mutation_probability = 0.2
    number_generations = 40

    # Evaluate the entire population
    fitnesses = map(toolbox.evaluate, pop) 
    # Set fitnesses to individuals
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit
 
    # loop for the number of generation we need
    for g in range(number_generations):
        # Select the next generation individuals
        offspring = toolbox.select(pop, len(pop))
        # Clone the selected individuals
        offspring = map(toolbox.clone, offspring)

        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]):
            if random.random() < mate_probability:
                toolbox.mate(child1, child2)
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:
            if random.random() < mutation_probability:
                toolbox.mutate(mutant)
                del mutant.fitness.values

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = map(toolbox.evaluate, invalid_ind)
        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = fit

        # The population is entirely replaced by the offspring
        pop[:] = offspring

    return pop

Checkpointing

As we all know, thousand of bad things can happen when a program is running. For this, persistence is needed to be able to restart from the point the program crashed. To do so, a si a simple dict and a serialization method are required. Important data will be inserted in the dictionary and serialized to a file so that if something goes wrong, the evolution can be restored from the last saved checkpoint. This can be also used to continue the evolution after a pre-fixed termination criterion.

Checkpointing is not offered in the standard algorithms. Either you create your own algorithm, or you copy one of the implemented algorithms and you add the feature by yourself. Only a few lines have to be added: you read the checkpoint using pickle and assigned the saved stated to the variables for the population, hall of fame, the random state (very important for reproducibility), number of generation reached and the logbook.

if checkpoint:
    # A file name has been given, then load the data from the file
    with open(checkpoint, "r") as cp_file:
        cp = pickle.load(cp_file)
    population = cp["population"]
    start_gen = cp["generation"]
    halloffame = cp["halloffame"]
    logbook = cp["logbook"]
    random.setstate(cp["rndstate"])
else:
    # Start a new evolution
    population = toolbox.population(n=300)
    start_gen = 0
    halloffame = tools.HallOfFame(maxsize=1)
    logbook = tools.Logbook()

To save the states another piece of code is required. It just fills a dictionary with all the data needed to start the evolution from this point.

if gen % every_how_many_gen_save_state== 0:
    # Fill the dictionary using the dict(key=value[, ...]) constructor
    cp = dict(population=population, generation=gen, halloffame=halloffame,logbook=logbook, rndstate=random.getstate())

    with open("checkpoint_name.pkl", "wb") as cp_file:
         pickle.dump(cp, cp_file)

LogBook & Statistics

In the previous paragraph, the logbook class came out. What does it do? It records evolution as a chronological list of dictionaries. For example, you could record the number of generation together with the number of evaluation done during it plus all the statistics that we defined with the command Statistics. In the following example, the statistics defined are the mean, standard deviation, min, and max of the fitness. DEAP automatically computes them at every generation.

stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean, axis=0)
stats.register("std", np.std, axis=0)
stats.register("min", np.min, axis=0)
stats.register("max", np.max, axis=0)

History

DEAP provides also a way to visualise the graph about the genealogy of all the individuals produced in the evolution. The produced genealogy tree is compatible with the NetworkX library, and on the DEAP website, there is the code on how to produce it.

Genealogy of all the individuals in the evolution with the colour of the nodes indicate their fitness, blue is low and red is high.[5]

References

[0] https://www.inverse.com/topic/evolution

[1]https://en.wikipedia.org/wiki/DEAP_(software)

[2]https://deap.readthedocs.io/en/master/tutorials/basic/part1.html

[3]Back, Fogel and Michalewicz, “Evolutionary Computation 1 : Basic Algorithms and Operators”, 2000.

[4] Collette, Y., N. Hansen, G. Pujol, D. Salazar Aponte and R. Le Riche (2010). On Object-Oriented Programming of Optimizers — Examples in Scilab. In P. Breitkopf and R. F. Coelho, eds.: Multidisciplinary Design Optimization in Computational Mechanics, Wiley, pp. 527–565;

[5] https://deap.readthedocs.io/en/master/api/tools.html#deap.tools.History