Distributed Genetic Algorithms Part 1: Background & Basic Principles

A Genetic Algorithm is a means for which to cover a large search space of configurable optimisation. For example, suggest we are trying to optimise a schedule such as a school timetable or train schedule, a genetic algorithm can be used to quickly find an optimal schedule, without trying each and every possibility.

Fundamentally, a GA works by using a chromosome (Let’s simply think of a JSON object containing a few configurable variables). Given a population size, a seed function will create an array of individuals for generation 0, matching the chromosome with given random generation function(s).

Visualising the chromosome as a JSON object containing discrete values is one form, there are two others. The first is combinatorial, where the structure is like that of a binary string (“01101001”). The second is an array of real number vectors.

After creating the initial generation, the GA will evaluate the fitness score for each individual. The fitness score is then used for selection (Tournament) by ensuring that individuals selected for mutation and crossover are within the upper bounds of the population (Although still randomly selected). Crossover takes two individuals and randomly mixes parts from one parent, and parts from another parent to create two children with components from both. Mutate takes one individual and alters certain characteristics, usually with mathematical operators to create a new individual.

Once the new generation has been populated, we repeat the fitness evaluation and keep creating more generations, exploiting the results of the fitness scores to find an individual close enough to the target solution. Other exit conditions can include a timeout, a generation cap or a lack of change in fitness for X generations.

Fig 1. A potential GA search space

This method, unlike an exhaustive search (Iterating through all possible solutions within the search space, one by one), has issues of falling into a local optima, instead of finding the global optimal, and the desired solution.

As you can see in Figure 1, the system is correctly improving on its results and moving up the curve of a local optima, but the trade-off between exploration and exploitation mean that it hasn’t found the global optimal and subsequently isn’t exploiting it. This is outside the scope of this article and there are many academic papers on how to avoid this so I will not cover it here.


The Bottleneck

Fundamentally, the bottle-neck for most conventional GAs is single-threaded processing. When evaluating the fitness for each individual of a generation, each individual is calculated iteratively (In fact, the same is done for the generational operators: Seed, Mutate & Crossover). This means that depending on the size of the chromosome, or the computational difficulty (Time taken) of the fitness function, the time for discovery of a solution could take an exponentially long time.

Given that whatever time taken per individual (Genetic operator + Fitness) is multiplied by the number of individuals that are in that generation (250), for each generation before the “answer” (Let’s say 2500), then an individual that takes 20 minutes to solve (Think of training a Neural Network, or iterating through several megabytes of data, hundreds of thousands of rows in a database) could take (20 minutes * 250 * 2500) 24 years to find a solution.

All is not lost, however, since there are two ways to optimise this problem. The first is to optimise the chromosome or the fitness function, i.e. distributing the problem itself into smaller pieces. Where one individual, distributes its problem across an entire GPU to solve it in a fraction of the time, or the chromosome is optimised to remove unnecessary variables, or more efficiently modelled. Optimisers exist for large scale combinatorial problems and real number vectors so for this example I will continue working on the premise of a discrete problem, such as that of the configuration of a Neural Network which includes the number of start nodes, hidden layers, etc.

The second way to optimise this problem is to introduce distribution. There is a great deal of literature on structured populations and the introduction of population migration between clusters but the basis of this optimisation can be seen within a master/worker setup.


Parallelisation

The master/worker GA allows for a centralised population to distribute certain functionality that may be CPU blocking to the worker nodes to compute in parallel. Where above we looked at 24 years on a single-threaded problem, suggest we have 16, 8-core machines available to us, and reduce the population size down to 128 (with one additional machine for the master node), 24 years is suddenly cut down to 35 days.

As you can see now, we have two solutions to our bottle-neck. Firstly, we optimise the time taken per individual of the population (By distributing to a GPU or improving the calculation), but once we’ve reached the bottom of that, the second is to optimise the time per generation of evolution, down to the minimum time taken for one individual to be calculated.

Thereby one individual takes 20 minutes, one generation takes 20 minutes to calculate, too. Further trade-offs can be made to lower the time to get to a useful answer, or removing complexity from the fitness functions, adding abstractions or even short-term estimations as well as caching so that identical individuals aren’t re-calculated (Such as when elitism is true, ensuring the fittest individual survives).


In part two, we will cover how to design and implement this in Node.js but for now, feel free to check out the inspiration for my work: Genetic.js. This library simplified for me, with examples that brought about my understanding of how powerful a GA was.