Genetic Algorithms for Unresolvable Optimization Problem in Python

Anthony Schrapffer
11 min readJul 22, 2024

--

The traveling salesman problem is an unsolvable optimization puzzle. It challenges us to find the shortest possible route between multiple cities, but its complexity grows so rapidly with each added city that finding the perfect solution becomes practically impossible.

Since perfect solutions are hard to find, nature-inspired algorithms like the genetic algorithm provide a smart alternative by using evolution-like techniques to discover the best possible routes.

This article presents both the Traveling Salesman Problem (TSP) and a solution using Genetic algorithm. It is completed by python code which are present in the following repository : https://github.com/VSCHY/GenAlgo_Salesman

Algorithm Time Complexity

Computational complexity is a fundamental concept in computer science that examines the resources required to solve computational problems. These resources can include time (how long an algorithm takes to run), space (how much memory it uses), and other factors such as parallelism and energy consumption.

Time complexity measures the amount of time an algorithm takes to complete as a function of the size of its input. It is often expressed using Big O notation, which provides an upper bound on the growth rate of the algorithm’s running time. For example, an algorithm with a time complexity of O(n) is said to have linear time complexity, meaning its running time increases linearly with the size of the input. Other common complexities include O(log n) for logarithmic time, O(n²) for quadratic time, and O(2^n) for exponential time.

Figure 1: Computational time complexity generated using ChatGPT 4o.

In addition to categorizing individual algorithms, computational complexity also helps us classify entire problems. Problems are grouped into complexity classes based on the resources needed to solve them. For instance, the class P consists of problems that can be solved in polynomial time, while the class NP includes problems for which a solution can be verified in polynomial time. One of the most famous open questions in computer science, the P vs. NP problem, asks whether every problem whose solution can be verified quickly (in polynomial time) can also be solved quickly.

NP-hard problems represent some of the most challenging issues in computational complexity. An NP-hard problem is at least as difficult as the hardest problems in NP (nondeterministic polynomial time), but it is not necessarily a member of NP itself. This means that while solutions to NP-hard problems can be verified quickly (in polynomial time) if provided, finding these solutions is believed to require non-polynomial time, making them exceptionally difficult to solve efficiently. NP-hard problems encompass a wide range of computational tasks, from optimization problems like the Traveling Salesman Problem.

2. Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classic and notoriously challenging optimization problem. Despite extensive research, no polynomial-time solution exists, making it an NP-hard problem.

The Traveling Salesman Problem can be resumed as follows: given a list of cities and the distances between each pair of cities, find the shortest possible route that visits each city exactly once. Despite its simplicity, the Traveling Salesman Problem is computationally intractable for large numbers of cities because the number of possible routes increases factorially with the number of cities.

Figure 2: Traveling Salesman Problem generated using ChatGPT 4o.

The history of the Traveling Salesman Problem (TSP) has origins that date back to the 19th century. The formalization in the 1930s and 1940s. In the early 1930s, Karl Menger, a Viennese mathematician, introduced the problem in lectures on the “combinatorial distance problem.” This was part of his broader exploration of geometric and graph theoretical concepts. Around the same time, notable mathematicians like Hassler Whitney and Merrill Flood began discussing the problem in relation to its practical applications in routing and logistics.

In the 1940s, the problem gained more formal attention with the work of Merrill Flood and other researchers at Princeton University, as he was involved in optimizing school bus routes.

Throughout the latter half of the 20th century, TSP continued to be a central problem in optimization research, driving the development of new algorithms and computational techniques.

3. Genetic Algorithms: A Theoretical Overview

Genetic algorithms (GAs) are inspired by the process of natural selection. They are used to finding approximate solutions to optimization and search problems. GAs work by evolving a population of candidate solutions through iterations called generations.

Figure 3: Genetic Algorithm generated using ChatGPT 4o.

As a result, brute force methods, which involve evaluating all possible permutations, quickly become impractical for even moderately sized instances. However, Genetic Algorithms (GAs) offer a smart and elegant approach to exploring possible solutions. Inspired by the principles of natural selection and evolution, GAs simulate the process of natural evolution to iteratively improve solutions, balancing exploration and exploitation in the search space. This project leverages the power of GAs to provide an educational and practical implementation for solving the TSP, demonstrating how nature-inspired algorithms can effectively tackle complex computational problems.

  • Population: population of possible solutions
  • Fitness: the performance of each solution can be measured through what is called fitness
  • Selection: the fitness is used to select individual among the population for crossing them (2 solutions having a child solution) and keep some elements to apply them mutation
  • Crossover: process of crossing 2 individual to obtain a child solution
  • Mutation: process of applying random variation in a particular solution.

The key steps in a GA include:

  1. Initialization: Start with a randomly generated population of candidate solutions.
  2. Selection: Evaluate the fitness of each candidate and select the best-performing candidates for reproduction.
  3. Crossover: Combine pairs of candidates to produce offspring for the next generation.
  4. Mutation: Introduce random changes to some candidates to maintain genetic diversity.
  5. Termination: Repeat the process until a stopping criterion is met, such as a maximum number of generations or a satisfactory fitness level.

IMPLEMENTATION

The Traveling Salesman Problem must be described using a distance matrix containing all distance between 2 cities. In this case, distance in both sense are the same, therefore the matrix is symmetric.

Figure 4: Distance matrix in the Traveling Salesman Problem.

We consider that each one of the N cities visited is referred to as a number between 1 and N (included). A solution is presented as a vector of size N containing each number between 1and N only once. This will be the “DNA” of each individual. In these illustration, the number N has been kept small. In the practical example, N will be equal to 30 (cf. code implementation).

I1: [2,4,3,1,5,...]

In the example above, it means that the traveler first is in city 2, then goes to 4 then to 3 etc.

The fitness function for a vector is the total distance for a specific individual. The objective is to minimize it. Considering N cities to travel by, noting d the distance matrix and I the vector for a certain individual, the fitness is calculated as shown in Equation 1.

Equation 1: Fitness calculation.

CROSSOVER

The crossover between two individuals I1 and I2 in a genetic algorithm is defined by the following steps:

  1. Selecting two integers a and b: such that 3 ≤ a < b ≤ 30, with the additional constraints that a ≤ 27.
  2. Creating the initial segment of the child from I1: Copy the elements from positions a to b from I1 directly into the child.
a = 2 and b = 5
I1 = [1,2,3,4,5,6,7,8]
I2 = [7,6,4,5,8,1,2,3]

Initially, the child will be:

child = [_,2,3,4,5,_,_,_]

Removing the copied segment elements from I2: Eliminate the elements present in the segment I1[a:b] from I2.

I2 = [7, 6, _, _, 8, 1, _, _]

Filling the remaining positions in the child: Fill the remaining empty positions in the child with the remaining elements from I2 and fill the child with these remaining elements:

child = [7,2,3,4,5,6,8,1]
@jit(nopython=True)
def crossover(parent1, parent2):
size = parent1.shape[0]
#
a = random.randint(2,27)
b = random.randint(a,29)
child1 = create_child(parent1, parent2, size, a, b)

a = random.randint(2,27)
b = random.randint(a,29)
child2 = create_child(parent1, parent2, size, a, b)
return child1, child2

MUTATION

For each mutation operation, randomly select two distinct indices within the array. The elements at the selected indices are swapped. This swap changes the arrangement of elements in the array, creating a new variant of the solution.

If multiple mutations are required, repeat the process of random selection and swapping. Each subsequent mutation builds upon the previous one, further diversifying the solution.

@jit(nopython = True)
def calc_mutation(array_in, n = 1):
array_out = array_in.copy()

for _ in range(n):
i0,i1 = np.random.choice(30, 2, replace=False)
array_out[i0] = array_in[i1]
array_out[i1] = array_in[i0]
if n > 1:
array_in = array_out.copy()
return array_out

SELECTION PROCESS

In the present implementation, we decided to select on one side 70% of the population to use them as “parents” in the crossing process, each couple of parents has 2 children. Then 30% of the population is also kept for next generation but with mutation.

There are different ways to select parents and individual to be affected by mutation:

  • UNIFORM: selection over the population using uniform probability
@jit(nopython = True)
def get_uniform(size_population, size_breeding, size_mutation):
selected_elements = np.random.choice(size_population, size_breeding, replace=False)
np.random.shuffle(selected_elements)
#
parent1 = selected_elements[:int(size_breeding/2)]
parent2 = selected_elements[int(size_breeding/2):]

mutations = np.random.choice(size_population, size_mutation, replace=False)

return parent1, parent2, mutations
  • LOTTERY: random selection over the population but weighted by the performance of elements. Here fitness is a function to be minimized, so we define the probability as: (1/fitness)/(sum_population(1/fitness))
def get_lottery(size_population, size_breeding, size_mutation, fitness):
p = (1/fitness) / np.sum(1/fitness)
selected_elements = np.random.choice(size_population, size_breeding, replace=False, p = p)
np.random.shuffle(selected_elements)
#
parent1 = selected_elements[:int(size_breeding/2)]
parent2 = selected_elements[int(size_breeding/2):]

mutations = np.random.choice(size_population, size_mutation, replace=False, p = p)
return parent1, parent2, mutations
  • ELITIST: select the elements with best fitness (lowest fitness)
@jit(nopython = True)
def get_elitist(size_breeding, size_mutation, fitness):
selected_elements = np.argsort(fitness);
selected_elements_crossover = selected_elements[:size_breeding]
np.random.shuffle(selected_elements_crossover)
#
parent1 = selected_elements[:int(size_breeding/2)]
parent2 = selected_elements[int(size_breeding/2):]

mutations = selected_elements[:size_mutation]
return parent1, parent2, mutations
  • TOURNAMENT: select randomly sample of the population. In these sample select a certain percentage of the best elements.
@jit(nopython = True)
def get_tournament(size_population, size_tournament, fitness, size_breeding, top_perc = 10):
selected_elements = np.zeros(size_population, dtype = np.int32)
n = 0
while n<size_population:
subselect = np.random.choice(size_population, size_tournament, replace = False)
#
b = np.zeros(subselect.shape[0])
for i in range(subselect.shape[0]):
b[i] = fitness[subselect[i]]
b = np.argsort(b)

for i in range(int(size_tournament*top_perc/100)):
selected_elements[n] = subselect[b[i]]
n+=1
np.random.shuffle(selected_elements)

parent1 = selected_elements[:int(size_breeding/2)]
parent2 = selected_elements[int(size_breeding/2):size_breeding]
mutations = selected_elements[size_breeding:]
return parent1, parent2, mutations
  • RANDOM: Another method of full random regeneration of the population at each step (like during initialization) has also be implemented for comparison.

4. Numerical Implementation

The genetic algorithm is implemented in a single class to facilitate its use:


class geneticAlgo:
def __init__(self, size_population:int, ratio_breeding:float,
size_tournament:int, gene_size:int):
self.size_population = size_population
self.ratio_breeding = ratio_breeding
self.size_tournament = size_tournament
self.gene_size = gene_size
self.matrix = load_data_TSP1()
self.L_options = ["uniform", "elitist", "lottery", "tournament", "random"]

self.test_1 = np.array([np.sum([i for i in range(self.gene_size)])])
self.test_2 = np.array([i*self.size_population for i in range(self.gene_size)])

self.size_breeding = int(size_population*ratio_breeding)
if self.size_breeding % 2 !=0:
self.size_breeding = self.size_breeding-1

self.size_mutation = int(self.size_population-self.size_breeding)

self.name_indicators = ["best_fitness","average_fitness","diversity", "fitness_variance"]
self.reset_indicators()

def reset_indicators(self):
self.indicators = {}
for n in self.name_indicators:
self.indicators[n] = []

def create_population(self, check:bool = False):
self.reset_indicators()
self.population = np.array([np.random.permutation(self.gene_size) for _ in range(self.size_population)], dtype=np.int64)
self.update_fitness()
self.calculate_indicators()
self.n_gen = 1
if check: self.check_pop()

def check_pop(self):
pop_sorted = np.sort(self.population, axis = 1)
t_1 = np.sum(pop_sorted, axis = 1)
t_2 = np.sum(pop_sorted, axis = 0)
if (t_1==self.test_1).all() and (t_2==self.test_2).all():
print("Population checked: fine.")
else:
print("Population checked: error.")


def next_gen(self, option:str, check:bool = False):
if option not in self.L_options:
print(f"option {option} not available, please select value in {self.L_options}")

if option == "uniform":
parent1, parent2, mutations = get_uniform(self.size_population,
self.size_breeding,
self.size_mutation)
#
elif option == "elitist":
parent1, parent2, mutations = get_elitist(self.size_breeding,
self.size_mutation,
self.fitness)
elif option == "lottery":
parent1, parent2, mutations = get_lottery(self.size_population,
self.size_breeding,
self.size_mutation,
self.fitness)
elif option == "tournament":
parent1, parent2, mutations = get_tournament(self.size_population,
self.size_tournament,
self.fitness,
self.size_breeding,
top_perc = 10)

if option == "random":
self.population = np.array([np.random.permutation(self.gene_size) for _ in range(self.size_population)], dtype=np.int64)

elif option != "random":
self.population = get_new_pop(self.population, parent1 = parent1, parent2 = parent2, mutations = mutations)

self.update_fitness()
self.calculate_indicators()
self.n_gen +=1
if check: self.check_pop()

def update_fitness(self):
self.fitness = get_fitness(self.population, self.matrix)

def calculate_indicators(self):
self.indicators["best_fitness"].append(np.min(self.fitness))
self.indicators["average_fitness"].append(np.mean(self.fitness))
self.indicators["fitness_variance"].append(np.std(self.fitness))

5. Results

In the following experiment, the size of the population has been set at 200.000. The initial population is generated randomly.

For each option (uniform, elitist, tournament, lottery, and random), 10 experiments have been run, each one with the 20 consecutive generation from the initial one. 70% of the population is generated by crossing (each couple get 2 children) and 30% from mutation of the original population. Therefore, the size of the population remains stable.

For the tournament, the sample size is defined as 20000 and 10% of the best element from the sample is selected for crossing and mutation.

Once the genetic algorithm has run for the specified experiments, we can analyze the output to see the quality of the solution.

Different indicator can be considered:

  • Best performance: What is the evolution of the best performance over the previous run for each generation.
  • Average performance: What is the average performance among the population.
  • Diversity: Did the population maintain diversity, or did it converge prematurely ? (not assessed in this case)
Figure 5: Evaluation of the best fitness, expressed in thousands of km.

Best fitness: Figure 5 shows that the elitist and tournament simulations leads to better result compared to random generation. Tournament, in particular, shows impressive difference (random remains above 20.000 km while tournament reaches 10.000 km).

Figure 6: Evaluation of the average fitness, expressed in thousands of km.

Average fitness: Figure 5 shows that the lottery, elitist and tournament simulations leads to decreased average fitness, while the random one remains stable. This means that these algorithm leads to overall improvement of the solutions composing the population.

Again, the tournament option shows impressive difference. In this case we might wonder if this is not related to a too fast convergence over a part of the population with good result neglecting the fact of keeping good diversity to explore all possible solutions.

6. Conclusions

Genetic algorithms are a valuable approach to solving the Traveling Salesman Problem offering a robust method for finding near-optimal solutions through mechanisms inspired by natural evolution.

However, as technology has advanced, other modern algorithms offers alternative to Genetic Algorithm with good efficiency. Notable alternatives include Ant Colony Optimization (ACO), which mimics the foraging behavior of ants to explore optimal paths, and Particle Swarm Optimization (PSO), which draws inspiration from social behavior patterns in animals.

Additionally, Simulated Annealing (SA) and advanced versions of Local Search algorithms have proven effective in addressing the TSP.

The evolution of this problem-solving domain has also led to practical applications in contemporary logistics, such as coordinating deliveries using combinations of drones and trucks. This hybrid approach optimizes delivery routes by leveraging the speed and flexibility of drones for last-mile delivery while maintaining the efficiency and capacity of trucks for longer distances.

It can also be useful for other applications such as smart city infrastructure maintenance (scheduling and routing maintenance activities for smart city infrastructure, such as automated street cleaning, waste collection, and utility inspections), automated urban mobility (in particular ride-sharing services), food delivery services.

--

--

Anthony Schrapffer

Passionate data scientist with a PhD in Climatology. Working on Climate Risks and ESG issues to enlight social and environmentally conscious decisions. 🌱