Genetic Algorithms: An Essential Introduction — From Understanding Nature to Coding

--

A 100-Queen solution — Image from Author

“Nature is the best teacher, for she reveals her secrets to those who are curious and eager to learn.”

Quote by Francis Bacon

Terms

Survival and reproduction are fundamental aspects of life that are essential for the continuation of species, including plants, animals, and humans. While the methods of reproduction may differ, they all follow the same mechanisms and strategies to ensure survival.

In computer science, one of the most influential models inspired by nature is the Genetic Algorithm (GA). GA is a fundamental algorithm used for solving optimization problems by mimicking the main process of survival observed in nature.

To better understand GA, there are four key terms and topics that are essential to grasp: population, fitness function, selection, and mutation. By understanding these concepts, we can gain a clearer understanding of how GA works and how it can be used to solve complex problems.

In this article, we will delve into these concepts and explore how GA has revolutionized problem-solving in various industries.

Gene

A gene is a unit of heredity that is responsible for transmitting genetic information from one generation to the next. It is a specific sequence of DNA that codes for a particular trait or characteristic. In the field of genetic algorithms, genes serve as a representation of potential solutions to a problem. These genes can be manipulated by genetic operators, such as mutation and crossover, to generate new candidate solutions in pursuit of finding an optimal solution.

Chromosome

A chromosome is a structure made of genes that carry information It is the main components in DNA and its patterns determines the traits, attributes and behaviors of existences.

Mutation

Mutation is a fundamental operation in genetic algorithms, which involves the random alteration of a small number of genes in a chromosome. This process occurs by exchanging genetic information within the chromosome. Adjusting and tuning the mutation operator is crucial for optimizing the performance of genetic algorithms and other evolutionary computation methods.

Crossover

Crossover is another fundamental genetic operation in which genetic information is exchanged between two or more chromosomes. It is a process similar to the exchange of genetic material between parents during sexual reproduction. Unlike mutation, which alters genetic information within a single chromosome, crossover combines genetic information from different chromosomes to create new candidate solutions.

N-Queen Problem: A Case Study Problem for GA

One of the classic challenges in chess is the 8-Queen, or the more general N-Queen problem. The objective of the problem is to place N queens on an NxN chessboard, such that no two queens are attacking each other. In other words, no two queens can share the same row, column, or diagonal. This problem is named after the queen chess piece, which can move any number of squares horizontally, vertically, or diagonally. The challenge of the N-Queen problem lies in finding a valid arrangement of queens on the board, and in developing efficient algorithms to solve the problem for larger values of N.

The following image depicts a valid solution to the 8-Queen problem, where none of the queens can attack each other. In the context of solving this problem using genetic algorithms, a “fitness function” is defined to evaluate the quality of each candidate solution, such as the number of non-attacking queen pairs or the total number of attacking pairs. The goal is to find the solution with the highest fitness value that satisfies the problem constraints.

Fitness

The 8-Queen problem is a classic example of a soft problem in chess where the challenge is to place eight queens on a chessboard so that no two queens threaten each other. A fitness function is used to measure the fitness of a chromosome, which determines how close it is to a solution. The fitness score and fitness functions are important concepts in GA and will be explained in more detail in future parts. To solve any GA problem, the following steps are typically followed:

  1. Encode the problem context to a genome/chromosome encoding
  2. Define a population of chromosomes with random genes
  3. Apply GA operations on the chromosomes and evaluate their fitness function
  4. Sort the chromosomes based on their fitness and keep the stronger ones while ignoring the weaker ones, and
  5. Repeat steps 3 and 4 until chromosomes that are close to a solution are obtained.

Encoding

Encoding is the first and crucial step in defining a chromosome and GA coding. For instance, in the N-Queen problem, we can choose a 4-Queen case and represent the chromosome through a number encoding. In the following image, we can see that each queen has a number, representing its position on the board. For example, the chromosome [4,3,2,1] represents the situation where the first queen is in the fourth row, the second queen is in the third row, the third queen is in the second row, and the last queen is in the first row. We can define this encoding as a list, array, tuple, or any other data structure in our code.

The following shows four additional chromosomes, each representing a different 4-Queen situation. While 100 chromosomes may be sufficient for a simple case like the 4-Queen problem, more complex problems with longer chromosomes may require larger populations of 10,000 or even 1,000,000 chromosomes.

For the case of 8-Queen, we need to have chromosomes of length 8; an example of a chromosome have been shown in the following image:

Mutation

For this case, only the mutation operation has been applied, and no crossover was used. The following shows a chromosome before and after a mutation.

In this particular chromosome, we changed two genes (4 and 8) during the mutation operation. Depending on the case, we can perform more gene exchanges to increase the randomness of the operation.

Fitness Score

As mentioned earlier, fitness is a measure of how close a chromosome is to a solution or how far it is from one. However, it is a heuristic measure and can vary depending on how one wants to measure it. For the 4-Queen problem, we can define the fitness as the number of times the queens are attacking each other on the chessboard. In this particular chromosome, the queen in the first column is attacking three queens, the queen in the second column is attacking three other queens, and so on. Therefore, the fitness score for this chromosome would be 12.

Fitness for this chromosome is 3 + 3 + 3 +3 = 12

Following image shows 4 chromosomes with their fitness scores:

The best fitness score belongs to image 4 where it is 0, indicating that it is a valid solution. Any chromosome having a fitness score close to 0 or 0 should be considered a potential solution. Conversely, higher fitness scores such as those in image 1 should be discarded. Fitness scores allow us to separate stronger chromosomes from weaker ones, as shown in the following example:

Results

At the end, I have tested the code on various lengths of chromosomes:

1- A chromosome with 8 genes:

An 8-Queen solution

2- A chromosome with 25 genes:

A 25-Queen solution

3- A chromosome with 100 genes:

A 100-Queen solution

Code Investigation

In the previous introduction, I provided a detailed explanation of the fundamental steps involved in training a Genetic Algorithm (GA). I discussed important concepts such as mutation, genes, chromosomes, and genetic population, and presented a case study on solving the N-Queen problem using a GA.

Following the publication of the article, I proceeded to create a repository and converted my previously written Matlab code into Python code. In this article, my focus is on explaining the different components of the repository and how the main file is structured to set up the GA scenario and find the optimal solution. You can find the repo here.

The main file (n_queen_solver.py) serves as the entry point for setting up the GA model and initiating the training process. It prompts the user to provide essential parameters that are crucial for configuring the GA model. These parameters include:

  1. Chromosome size or Chessboard Size: The size of the chessboard, which represents the number of queens and the dimensions of the board.
  2. Population Size: The number of chromosomes in the population, representing the number of candidate solutions.
  3. Epochs: The number of iterations or generations for which the GA model will be trained.
parser = argparse.ArgumentParser(description='Computation of the GA model for finding the n-queen problem.')
parser.add_argument('chromosome_size', type=int, help='The size of a chromosome')
parser.add_argument('population_size', type=int, help='The size of the population of the chromosomes')
parser.add_argument('epoches', type=int, help='The nmber of iterations to traing the GA model')
args = parser.parse_args()

After obtaining the parameters, the next block of code is responsible for initializing the population. The ‘init_population()’ method generates the population based on the specified number of individuals, using the encoding explained in the previous article. It returns the initialized population back to the main method of the file.

The fitness function and the calculation of the fitness score play a crucial role in selecting the best parents and guiding their reproduction to ensure the program progresses along the optimal path. For the simplicity and clarity of this implementation, I have chosen a straightforward fitness method. The following code block demonstrates the ‘fitness()’ method:

def fitness(chrom,chromosome_size):
q = 0
for i1 in range(chromosome_size):
tmp = i1 - chrom[i1]
for i2 in range(i1+1,chromosome_size):
q = q + (tmp == (i2 - chrom[i2]))
for i1 in range(chromosome_size):
tmp = i1 + chrom[i1]
for i2 in range(i1+1,chromosome_size):
q = q + (tmp == (i2 + chrom[i2]))
return 1/(q+0.001)

The ‘fitness()’ In this block received an individual chromosome and its size as input parameters and returns back its fitness score. In the given code block, the fitness function checks whether two queens in the chromosome are crossing each other or not. If two queens are found to be crossing, the variable ‘q’ is incremented by one. The purpose of this check is to evaluate the chromosome’s fitness based on the number of queen collisions.

The line ‘1 / (q+0.001)’ represents the fitness score based on ‘q’ . By using the reciprocal of the value ‘q + 0.001’, the fitness score will be higher for chromosomes with fewer queen collisions (i.e., a lower value of ‘q’). The addition of ‘0.001’ is to avoid division by zero.

The fitness score is used to assess the quality of each chromosome in the population. The higher the fitness score, the better the chromosome’s performance. In the case of reaching a fitness score of 1000, it signifies that the solution has been found, and the program can terminate without further operations.

def train_population(population,epoches,chromosome_size):
num_best_parents = 2
ft = []
success_booelan = False
population_size = len(population)
for i1 in tqdm(range(epoches)): # 1 should be epoches later
fitness_score = []
for i2 in range(population_size):
fitness_score.append(fitness(population[i2],chromosome_size)) #fitness score initialisation
ft.append(sum(fitness_score)/population_size)
pop = np.concatenate((population, np.expand_dims(fitness_score, axis=1)), axis=1)
sorted_indices = np.argsort(pop[:, -1])
pop_sorted = pop[sorted_indices]
pop = pop_sorted[:, :-1]
best_parents_muted = []
best_parents = pop[-num_best_parents:]
best_parents_muted = [mutation(best_parents[i], chromosome_size) for i in range(num_best_parents)]
pop[0:num_best_parents] = best_parents_muted
population = pop
if ft[-1] == 1000: # this should be calculated accurately. In each case the model might pass the potimum solution, so whenever it is touching the solution's score we should stop the training
print('Woowww, the model could find the solution!!')
print('Here is an example of a solution : ',population[-1])
success_booelan = True
break
return population, ft, success_booelan

The genetic algorithm (GA) employs a selection process to choose parents with higher fitness scores for reproduction through mutation or crossover. The resulting offspring are added to the population, while chromosomes with lower fitness scores are excluded from the next training round. The line ‘if ft[-1] == 1000’ checks if the latest fitness score indicates convergence to a solution. If the condition is true, the loop breaks, and the method terminates, making way for the execution of subsequent blocks.”

HINT: After reaching the solution or finding the global optimum of the solution space, it is possible for the program to continue executing operations. To ensure that the program terminates after finding the best fitness score, a break statement is used. The break statement exits the loop and ensures that no further operations are performed.

In the figure above, we observe the step-wise behavior of the learning curve. The program remains at a fitness score of 0 for the first 28 epochs and then suddenly jumps to a fitness score of 100. During a typical run, the solution is reached after 70 epochs, but there is a brief period where the program gets stuck at a fitness score of 600. You can run the program to generate additional learning curves or check the “repo/images/learning_curve” directory for existing curves. After training the model, two additional methods, namely “fitness_curve_plot” and “n_queen_plot”, are called to display the learning curve and visualize the positions of the queens on the chessboard.

Conclusion and Questions

This article presented various blocks of Python code for solving the N-Queen problem. The code can be modified and enhanced in different ways to improve efficiency or add additional capabilities for finding solutions. If you have any questions or suggestions about the code repository, feel free to share them in the comments.

Additionally, I invite you to consider the following questions and provide your insights:

  1. Can you propose another problem that could be solved using a genetic algorithm?
  2. Please share your thoughts on the encoding process, which is a crucial aspect of genetic algorithms.

In the next article, I plan to explore a more challenging case that goes beyond the N-Queen problem and may require more iterations to find a solution. Stay tuned for more exciting developments.

Conclusion

This article provides a brief introduction to Genetic Algorithms (GA), which involves encoding, mutation, and fitness sorting as its fundamental stages. In the next part, we will delve deeper into the code and provide further explanations. Your thoughts and ideas on this article are most welcome, and I look forward to sharing the next parts with you.

--

--