Genetic Algorithms Explained By Example

Published in

tiket.com

18 min readMay 2, 2023

Were you aware that it’s possible to simulate Evolution on your computer? By doing so, you can tackle extremely challenging problems that may seem impossible to solve otherwise! Today, we’re going to delve into this exciting realm!

The Problem:

Imagine you’re going on a trip and you’re only allowed to bring 3kg of hand luggage with you. You want to make the most out of it, so you carefully weigh each item and assign a value to it. You have the option to choose between a laptop, headphones, a coffee mug, a water bottle, and a notepad. Your goal is to find the right combination of items that will maximize the total value while staying within the weight limit of 3kg.

If you’d like, you can take a moment to pause and figure out the solution.

After careful consideration, the optimal solution is to bring the laptop, headphones, coffee mug, and water bottle, with a total value of 740 and a combined weight of 2.9kg. This way, you can make the most of your 3kg hand luggage allowance and bring the items that will provide the highest overall value for your trip.

Indeed, it wasn’t too difficult to find the optimal solution with just a small number of items. It’s fascinating how our brains are naturally adept at quickly identifying the best possible solution when presented with limited options. Our ability to optimize and make efficient choices is truly remarkable!

Even for a computer, finding the optimal solution is relatively straightforward, even with a naive implementation that involves trying out every possible combination. In this case, with just 5 items, there are only 32 different combinations to consider. For instance, my MacBook can complete this task in a mere 3 milliseconds or 0.000003 seconds!

So, let’s raise the stakes and add more items to the mix, shall we?

If you’d like, you can take a moment to pause and try to figure out the optimal solution for the added items, which include hair cream, phone, headphones, laptop, and coffee mug.

However, I must admit, it may be quite challenging to compute the optimal solution in your head this time. With a total of 1024 possible combinations, it could take a considerable amount of time and patience.

Interestingly, even for my MacBook, it took significantly longer this time, needing 0.000807 seconds, which is a whopping 269 times longer than before, due to the increased number of combinations to consider (1024 in total).

As the number of items increases, the time it takes to find the optimal solution using a naive approach also increases exponentially. For example, with 20 items, it took 1 second, and the number of possible combinations grew to 1,048,576. With 21 items, it took 1.94 seconds, and with 22 items, it took 3.91 seconds, and so on, as shown in the graph.

This clearly illustrates how the complexity of the problem grows exponentially with the number of items, making it impractical to find the optimal solution using brute force or exhaustive search methods for large sets of items.

At a certain point, the number of items becomes so large that it would not only cause you to miss your flight, but it would also take an astronomical amount of time for your MacBook to find an optimal solution. To put it into perspective, just 77 items would keep your MacBook occupied for 5 billion years before it comes up with the optimal solution!

You may be wondering if there is a more intelligent and efficient way to calculate this, and I assure you that there is.

The Knapsack Problem:

The challenge we’ve come across is commonly referred to as the Knapsack problem, a well-known conundrum that has been extensively studied for over a century. Despite the wealth of research, the question of whether there exists an algorithm that can efficiently determine if a combination of items can reach a specific value without exceeding a weight limit, and further ascertain if the solution is optimal, remains unanswered. This enigmatic problem is often denoted as the P-Versus-NP problem, and it stands as one of the most significant open questions in the field of computer science.

What are Genetic Algorithms?

But you may wonder, how does this relate to evaluating something on my computer? Is this just another clickbait? Fear not, as today we will delve into the realm of genetic algorithms, a class of algorithms that can provide solutions for problems where traditional calculation methods are not feasible.

Genetic algorithms are part of a larger family of algorithms known as Evolutionary Algorithms, which draw inspiration from the process of natural selection in living organisms.

The core concept behind a genetic algorithm is to create a population of potential solutions to a problem and use genetic operators such as selection, crossover, and mutation to evolve the population over time towards an optimal solution.

This approach proves to be particularly useful in situations where even the most powerful computing systems would take an impractical amount of time (possibly years) to solve a problem. Genetic algorithms can efficiently generate usable near-optimal solutions in a relatively short timeframe.

For example, you can use genetic algorithms to generate a packing list for your backpack or even design an antenna, as demonstrated by NASA in 2006 when they used a genetic algorithm to find the best radiation pattern for an evolved antenna.

**Evolved Antenna:** https://en.wikipedia.org/wiki/Evolved_antenna

After all, who am I to argue with evolution? Genetic algorithms offer a fascinating and practical approach to problem-solving in the realm of computer science.

Terms in Genetic Algorithms:

To effectively grasp the concept, it’s essential to familiarise oneself with the common terms used in genetic algorithms.

Population: Set of potential solutions.

Chromosome/Genome: One of the potential solutions in the population.

Genotype: The genetic makeup of a chromosome, consisting of its elements.

Phenotype: The expression of the genotype as a specific value or characteristic.

Fitness function: A function that assigns a weight to each chromosome based on its performance.

Fitness value: The value obtained from the fitness function, reflects the quality of a chromosome.

Decoding and Encoding: The process of converting the phenotype of a chromosome into a different representation, such as binary, real, permutation, or integer, and vice versa.

Generation: The number of iterations or cycles in the genetic algorithm process, representing the evolution of the population over time.

How does it work?

A genetic algorithm involves utilizing a population of potential solutions, which in this case pertains to various combinations of items that can be packed in a backpack.

In our population, each specimen possesses a genome that serves as the encoding for its solution, represented by binary values. Specifically, this genome encodes the contents of the backpack, where a value of 1 indicates the presence of an item from the list inside the backpack, while a value of 0 denotes its omission.

The collection of all existing solutions at a particular moment in the algorithm is referred to as a generation. Generation 0, also known as the initial generation, is characterized by a random assortment of potential solutions. Thus, the evolutionary process commences with a state of complete disorder or chaos.

Now, we initiate the process of natural selection to mimic the principle of survival of the fittest. To assess the quality of a particular solution, we utilize a fitness function. In our scenario, the fitness function calculates the value of the packed item as long as it remains within the weight limit. If the weight limit is exceeded, the fitness of the solution is assigned a value of 0.

Afterwards, we proceed with selecting parents for the next generation of solutions. Typically, solutions with higher fitness scores are more likely to be chosen for reproduction, while those with very low fitness scores are less likely. Like in real life, buff dudes get the girls :).

The process involves selecting two parent genomes, randomly cutting them in half at a chosen spot, and swapping the ends to create two new offspring genomes. This technique, known as single point crossover, generates two fresh solutions that become part of the next generation.

We continue to repeat the process until we have an adequate number of specimens in the next generation.

When you observe the crossing of two solutions, you’ll notice that it often results in improved ones. It’s truly incredible to see how natural selection not only operates in nature but also in computer simulations, and generates better solutions across generations.

However, it’s important to note that due to the randomness involved in the selection and crossover functions, there’s no assurance that our best solutions won’t be inadvertently destroyed.

This is where the concept of elitism comes into play.

Elitism entails selecting the N top solutions and directly copying them into the next generation. In this particular case, we will retain and propagate our top 2 solutions.

The next stage of evolution involves the introduction of mutations, which enable the discovery of new solutions that may not have been possible with the original gene pool. In the mutation phase, a random bit of the genome is simply changed with a certain probability. And that’s it! Behold, our new generation of solutions.

This algorithm continues to iterate until a satisfying solution is found or until a predetermined maximum number of generations is reached.

Now that you have a high-level understanding of genetic algorithms, it’s important to note that despite the various ways they can be implemented, they all share a common set of ingredients:

A genetic representation of a solution
A function to generate new solutions
A fitness function to evaluate solutions
A selection function to choose solutions for the next generation
A crossover function
A mutation function

It’s worth mentioning that each of these ingredients can be vastly different from what is commonly used today. For example, some problems may require solutions that cannot be expressed as simple strings of 1s and 0s, but rather as integers or floating-point values. In some cases, genetic algorithms may even utilize graphs as a means to represent solutions. Additionally, a single-point crossover may not always be sufficient, and multiple crossovers at different locations, or with different parents, may be necessary. Interestingly, research has shown that such multi-parent recombination can improve the quality of generated solutions (Paper: Genetic Algorithm with Multi-Parent Recombination). There’s still much to learn, and we have only just begun to scratch the surface of this fascinating field.

Genetic Algorithm from Scratch in Python:

Let’s begin implementing a genetic algorithm in Python! We’ll start by putting together a basic framework to experiment with and solve the knapsack problem we discussed earlier. What’s our first step?

# important libraries to import

from typing import List, Optional, Callable, Tuple
from random import choices, randint, randrange, random

from collections import namedtuple
from functools import partial

Genome / Population:

We need to generate a genome to represent a single solution. In our case, a genome is essentially a list of ones and zeros. Here, one denotes an item being included in a backpack, and zero indicates that the item at this index is left out. I will define a type for better readability, although Python is an untyped language. To generate a genome, we create a random list of ones and zeros with a length of k, which should be equal to the number of items we want to choose from.

What comes next is the creation of a function to generate new solutions. A population is essentially a collection of genomes represented as a list. To create this population, we can call our genome function multiple times until the population reaches the desired size.

Genome = List[int]
Population = List[Genome]

# generate representation of solution
def generate_genome(length: int) -> Genome:
    return choices([0, 1], k=length)

# function to generate new solutions
def generate_population(size: int, genome_length: int) -> Population:
    return [generate_genome(genome_length) for _ in range(size)]

Fitness Function:

A fitness function is required to evaluate a solution. The function takes three input parameters: a genome, a list of items to choose from, and a weight limit, and returns a fitness value.

To calculate the fitness, we iterate through all the items and check if they are part of the solution by verifying whether the genome has a one at the given index. If it does, we add the weight and value of the item to our accumulation variables. Once the weight exceeds the given limit, we abort the iteration and return a fitness value of zero since the solution is invalid. If we manage to iterate through the entire list of items without exceeding the weight limit, we return the accumulated value. Additionally, a smart condition to assume is that the given genome should have the same length as our list of Things.

FitnessFunc = Callable[[Genome], int]

# fitness function
def fitness(genome: Genome, things: [Thing], weight_limit: int) -> int:
    if len(genome) != len(things):
        raise ValueError("genome and things must be same length")
    
    weight = 0
    value = 0
    
    for i, thing in enumerate(things):
        if genome[i] == 1:
            weight += thing.weight
            value += thing.value
            
            if weight > weight_limit:
                return 0
    
    return value

Data for Example:

You may be wondering, what exactly is Thing? That’s a good question. Thing is a structure that consists of a name, value, and weight. With this structure, we can put together different lists of things.

Thing = namedtuple("Thing", ["name", "value", "weight"])
things = [
    Thing("Laptop", 500, 2200),
    Thing("Headphones", 150, 160),
    Thing("Coffe Mug", 60, 350),
    Thing("Notepad", 40, 333),
    Thing("Water Bottle", 30, 192)
]

more_things = [
    Thing('Mints', 5, 25),
    Thing('Sock', 5, 25),
    Thing('Tissue', 5, 25),
    Thing('Phone', 5, 25),
    Thing('Baseball Cap', 5, 25)
] + things

Selection Function:

To generate the next generation, select the solutions. This function chooses a pair of solutions to be the parents of two new solutions. It is preferable to choose solutions with higher fitness. We can use the choices() function from Python’s random module to assign weights to each element to choose from. By using the fitness of the genome as its weight, the fittest solutions have a higher chance of being chosen for reproduction. The final parameter indicates that we need to draw twice from our population to obtain a pair. However, what does the fitness_func parameter refer to? Why not use our fitness function instead?

This is related to software architecture and the separation of concerns. Our fitness function requires a list of items and a weight limit, which are specific to solving the knapsack problem. Therefore, if we want to use this implementation for a different problem, we need to implement a different selection function. The selection function only requires a fitness function that takes a genome and returns a fitness value to make the correct choices.

#selection function
def selection_pairs(population: Population, fitness_func: FitnessFunc) -> Population:
    return choices(
        population=population,
        weights=[fitness_func(genome) for genome in population],
        k=2
    )

Crossover Function:

Let’s continue. To obtain new solutions for the next generation, we require a crossover function. The single-point crossover function takes two genomes as input parameters and returns two genomes as output. We randomly select an index to cut the genomes in half. The first half of the first genome, a, is combined with the second half of the second genome, b, to create the first new solution. For the second solution, we take the first half of genome b and the second half of genome a. For this process to be effective, the genomes must be of the same length. Furthermore, a crossover function only makes sense if the length of a genome is at least two. Otherwise, there would be no point in splitting the genome in half.

#single point crossover
def single_point_crossover(a: Genome, b: Genome) -> Tuple[Genome, Genome]:
    if len(a) != len(b):
        raise ValueError("Genomes a and b must be of same length")

    length = len(a)
    if length < 2:
        return a, b
    
    p = randint(1, length-1)
    return a[0:p] + b[p:], b[0:p] + a[p:]

Mutation Function:

The final piece of the puzzle is mutation. The mutation function takes a genome and, with a certain probability, we flip ones to zeros and zeros to ones at random positions. To achieve this, we choose a random index. If the random value is greater than the mutation probability, we leave it unchanged. Otherwise, we need to flip it by setting the absolute value of the current value minus one. If the genome has a one at the index, we calculate abs(1–1), which is zero. If the genome has a zero at the index, we subtract zero from one to get -1, and the absolute value of -1 is 1. Thus, we have targeted a bit in our genome in the most convoluted way. While this may make us appear more intelligent, it is not advisable to use such obfuscated code in a professional setting. Nonetheless, I hope you enjoying this article! :)

def mutation(genome:Genome, num: int = 1, probablity: float = 0.5) -> Genome:
    for _ in range(num):
        index = randrange(len(genome))
        genome[index] = genome[index] if random() > probablity else abs(genome[index] - 1)
    return genome

Evolutionary Main Loop:

To bring all of these pieces together, we need to implement the actual evolution process.

PopulateFunc = Callable[[], Population]
SelectionFunc = Callable[[Population, FitnessFunc], Tuple[Genome, Genome]]
CrossoverFunc = Callable[[Genome, Genome], Tuple[Genome, Genome]]
MutationFunc = Callable[[Genome], Genome]

To maintain separation between the problem and the algorithm, we will use many functions as parameters:

A PopulateFunc takes no arguments and generates new solutions.
A SelectionFunc takes a population and a fitness function (FitnessFunc) to select two parents for our next-generation solution. We could generalize this further, but I’ll leave that as homework.
A CrossoverFunc takes two genomes and returns two new genomes.
The MutationFunc takes one genome and sometimes returns a modified one.

Now, let’s put these functions into action and define the function that runs the evolution process.

def run_evolution(
    populate_func: PopulateFunc,
    fitness_func: FitnessFunc,
    fitness_limit: int,
    selection_func: SelectionFunc = selection_pairs,
    crossover_func: CrossoverFunc = single_point_crossover,
    mutation_func: MutationFunc = mutation,
    generation_limit: int = 100
) -> Tuple[Population, int]:
    population = populate_func()
    
    for i in range(generation_limit):
        population = sorted(
            population,
            key=lambda genome: fitness_func(genome),
            reverse=True
        )
        
        if fitness_func(population[0]) >= fitness_limit:
            break
        
        next_generation = population[:2]
        
        for j in range(int(len(population) / 2) - 1):
            parents = selection_func(population, fitness_func)
            offsprings_a, offsprings_b = crossover_func(parents[0], parents[1])
            
            offsprings_a = mutation_func(offsprings_a)
            offsprings_b = mutation_func(offsprings_b)
            
            next_generation += [offsprings_a, offsprings_b]
        
        population = next_generation
    
    population = sorted(
        population,
        key=lambda genome: fitness_func(genome),
        reverse=True
    )
    
    return population, i

The function to run the evolution process requires several parameters. We start by providing the population function (populate_func) and the fitness function (fitness_func). We also define the fitness limit (fitness_limit) as an end condition. If the best solution’s fitness exceeds this limit, the evolution process terminates as we have achieved our goal. We can use the default implementations for the selection (selection_func), crossover (crossover_func), and mutation (mutation_func) functions, but we can also initialize them with our own functions if needed. The generation limit (generation_limit) defines the maximum number of generations for the evolution process to run if it does not reach the fitness limit before that.

First, we generate the first generation by calling the populate function. Then, we loop for the number of generations defined in generation_limit. We start by sorting our population by a fitness to ensure that the top solutions occupy the first indices of our genome list. This is useful for checking if we have already reached the fitness limit and returning early from the loop. We can also implement elitism and keep our top two solutions for the next generation.

Now, we generate new solutions for the next generation by selecting two parents and getting two new child solutions from them. We do this by calling the selection function and passing the selected parents to the crossover function. To increase the variety of solutions, we also apply the mutation function to each offspring. We loop for half the length of a generation to get as many solutions in the next generation as before. Since we have already copied the top two solutions from the previous generation, we can save one loop here.

After generating the new generation, we replace the current population with the new one and start the next round of the algorithm by sorting the population and checking if we have reached the fitness limit this time. Once we reach the fitness limit or have looped for the generation limit, we return to the current population, which we need to sort one last time in case we have run out of generations. We also return the number of generations we ran. This second return value is useful in quickly determining whether the algorithm exhausted the generation limit or found a solution that meets our fitness criterion.

population, generations = run_evolution(
    populate_func=partial(
        generate_population, size=10, genome_length=len(things)
    ),
    fitness_func=partial(
        fitness, things=things, weight_limit=3000
    ),
    fitness_limit=740,
    generation_limit=100
)

def genome_to_things(genome: Genome, things: [Thing]) -> [Thing]:
    result = []
    
    for i, thing in enumerate(things):
        if genome[i] == 1:
            result += [thing.name]
    
    return result

print(f"number of generations: {generations}")

print(f"best solution: {genome_to_things(population[0], things)}")

fitness(population[0], things, 3000)

Let’s put everything together. We call run_evolution and store the last population and the number of generations inside two variables. Next, we want to hand over our populate and fitness functions, but as explained earlier, the signature of these two functions differs from what run_evolution expects it to be. Therefore, we can use partial functions to preset the current parameters, which are specific to our current problem. This way, we can adjust our populate function without handing the population size and genome length to the run_evolution function, and without the need to write a completely new populate function. We hand over a list of things to our fitness function and predefine the weight limit to be 3 kilograms. We use the first example from earlier and know that the correct solution is a laptop, headphones, coffee mug, and water bottle for a value of 740.

We can observe the number of generations required to find a suitable solution and print the solutions using a helper function I have created to display information based on the genome. Let’s execute the algorithm and note that it successfully discovered the appropriate solution after only five generations.

Complete Code:

from typing import List, Optional, Callable, Tuple
from random import choices, randint, randrange, random

from collections import namedtuple
from functools import partial

Genome = List[int]
Population = List[Genome]

FitnessFunc = Callable[[Genome], int]
PopulateFunc = Callable[[], Population]
SelectionFunc = Callable[[Population, FitnessFunc], Tuple[Genome, Genome]]
CrossoverFunc = Callable[[Genome, Genome], Tuple[Genome, Genome]]
MutationFunc = Callable[[Genome], Genome]
PrinterFunc = Callable[[Population, int, FitnessFunc], None]

Thing = namedtuple("Thing", ["name", "value", "weight"])
things = [
    Thing("Laptop", 500, 2200),
    Thing("Headphones", 150, 160),
    Thing("Coffe Mug", 60, 350),
    Thing("Notepad", 40, 333),
    Thing("Water Bottle", 30, 192)
]

more_things = [
    Thing('Mints', 5, 25),
    Thing('Sock', 5, 25),
    Thing('Tissue', 5, 25),
    Thing('Phone', 5, 25),
    Thing('Baseball Cap', 5, 25)
] + things

# generate representation of solution
def generate_genome(length: int) -> Genome:
    return choices([0, 1], k=length)

# function to generate new solutions
def generate_population(size: int, genome_length: int) -> Population:
    return [generate_genome(genome_length) for _ in range(size)]

# fitness function
def fitness(genome: Genome, things: [Thing], weight_limit: int) -> int:
    if len(genome) != len(things):
        raise ValueError("genome and things must be same length")
    
    weight = 0
    value = 0
    
    for i, thing in enumerate(things):
        if genome[i] == 1:
            weight += thing.weight
            value += thing.value
            
            if weight > weight_limit:
                return 0
    
    return value

#selection function
def selection_pairs(population: Population, fitness_func: FitnessFunc) -> Population:
    return choices(
        population=population,
        weights=[fitness_func(genome) for genome in population],
        k=2
    )

#single point crossover
def single_point_crossover(a: Genome, b: Genome) -> Tuple[Genome, Genome]:
    if len(a) != len(b):
        raise ValueError("Genomes a and b must be of same length")

    length = len(a)
    if length < 2:
        return a, b
    
    p = randint(1, length-1)
    return a[0:p] + b[p:], b[0:p] + a[p:]

def mutation(genome:Genome, num: int = 1, probablity: float = 0.5) -> Genome:
    for _ in range(num):
        index = randrange(len(genome))
        genome[index] = genome[index] if random() > probablity else abs(genome[index] - 1)
    return genome

def run_evolution(
    populate_func: PopulateFunc,
    fitness_func: FitnessFunc,
    fitness_limit: int,
    selection_func: SelectionFunc = selection_pairs,
    crossover_func: CrossoverFunc = single_point_crossover,
    mutation_func: MutationFunc = mutation,
    generation_limit: int = 100
) -> Tuple[Population, int]:
    population = populate_func()
    
    for i in range(generation_limit):
        population = sorted(
            population,
            key=lambda genome: fitness_func(genome),
            reverse=True
        )
        
        if fitness_func(population[0]) >= fitness_limit:
            break
        
        next_generation = population[:2]
        
        for j in range(int(len(population) / 2) - 1):
            parents = selection_func(population, fitness_func)
            offsprings_a, offsprings_b = crossover_func(parents[0], parents[1])
            
            offsprings_a = mutation_func(offsprings_a)
            offsprings_b = mutation_func(offsprings_b)
            
            next_generation += [offsprings_a, offsprings_b]
        
        population = next_generation
    
    population = sorted(
        population,
        key=lambda genome: fitness_func(genome),
        reverse=True
    )
    
    return population, i

def genome_to_things(genome: Genome, things: [Thing]) -> [Thing]:
    result = []
    
    for i, thing in enumerate(things):
        if genome[i] == 1:
            result += [thing.name]
    
    return result

population, generations = run_evolution(
    populate_func=partial(
        generate_population, size=10, genome_length=len(things)
    ),
    fitness_func=partial(
        fitness, things=things, weight_limit=3000
    ),
    fitness_limit=740,
    generation_limit=100
)

print(f"number of generations: {generations}")

print(f"best solution: {genome_to_things(population[0], things)}")

fitness(population[0], things, 3000)

Conclusion:

In conclusion, the Knapsack problem highlights the limitations of traditional computation methods when dealing with complex optimization problems. Brute force methods quickly become impractical as the number of items increases, and it becomes impossible to find the optimal solution in a reasonable amount of time. This is where genetic algorithms, a type of evolutionary algorithm, come into play. Genetic algorithms use selection, crossover, and mutation to evolve a population of potential solutions towards an optimal solution over time. This approach proves to be particularly useful when traditional computation methods are not feasible, as genetic algorithms can generate usable near-optimal solutions in a relatively short timeframe. Overall, genetic algorithms offer a fascinating and practical approach to problem-solving in the realm of computer science.