Reinforcement Learning vs Genetic Algorithm — AI for Simulations

Published in

XRPractices

7 min readJul 26, 2021

While working on a certain simulation based project
“Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could”
Well, Robert Frost aside… :P
I had to decide, on an optimization approach that would better suit the use case. I had Reinforcement Learning and Genetic Algorithm (the two roads among many) in mind, but then an epiphany… “Both are nature inspired AI approaches, how are the two different? And more importantly, in which scenarios, is one favoured over the other?” And thus today we will be dissecting parts of the thought process behind coming to a decision.

What are they..?

Before we start comparing let’s get a better understanding of what each of these are…

Reinforcement Learning (RL)

Reinforcement learning is the training of machine learning models to make a sequence of decisions. It is structured as an Agent interacting with an Environment.

In reinforcement learning, an artificial intelligence (AI) faces a game-like situation (i.e. a simulation). The AI employs trial and error to come up with a solution to the problem. Slowly but steadily the agent learns to achieve a goal in an uncertain, potentially complex environment, but we can’t expect the agent to blindly stumble upon the perfect solution. This is where the interactions come into play, the Agent is provided with the State of the Environment this becomes the input/basis for the Agent to take Action. An Action firstly provides the Agent with a Reward (Note that rewards can be both positive and negative based on the fitness function for the problem) on basis of this reward the Policy (ML model) inside the Agent adapts/learns; secondly it impacts the Environment and changes it State, which means the input for the next cycle changes.

This cycle goes on until an optimal Agent is created. This cycle tries to replicates the learning cycle of organisms over its lifetime that we see in nature. For most cases the Environment is reset after a certain number of cycles or on a conditional basis. Note multiple Agents can be run simultaneously to get to the solution faster but all Agents run independently.

Genetic Algorithm (GA)

Genetic Algorithm is a search metaheuristic that is inspired by Charles Darwin’s theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation.

There are 6 major phases in genetic algorithm cycle.

Generate Initial population
A set of “individuals” is called a population, where each individual is characterized by a set of Genes represented in binary (i.e. as 0 or 1). A set of genes represented by a string/sequence is known as a Chromosome. The population with which we start is called the Initial Population.
Evaluation
A Fitness function is a system that determines how fit (the ability of an individual to compete with other individuals) an individual is. It gives a fitness score to each individual which helps quantify the performance.
This function is executed over the performance of the population to quantify and compare the individuals’ performance.
Selection
The process of picking the “fittest” (based on the fitness score generated during Evaluation phase) individuals for producing the next generation (i.e. the new population for the next cycle of evaluation and reproduction). There isn’t a strict cut-off based on the fitness score, but the selection works more on a probability basis (higher the fitness score, higher the probability to get selected), and a pair is selected for the next phase.
Crossover
The process of mixing the genes of the pair of individuals chosen to produce a new pair of individual is called Crossover or Genetic operation. This process is continued to create a new population. Crossover can be performed in different methods, a few notable ones are
a. One-Point Crossover
b. Two-Point Crossover
c. Order Crossover
d. Partially Mapped Crossover
e. Cycle Crossover
If you are interested in knowing the details here’s an article.
Mutation
In certain new individuals, some of their genes can be subjected to a Mutation with a low random probability. This implies that some of the Genes (bits) in the Chromosome (sequence of bit) can be altered (flipped). Mutation helps to maintain diversity within the population and prevent premature convergence.
Termination
The algorithm terminates when a population converges. Convergence here denotes that the individuals no longer have significant difference in their genetic structure. Termination can also occur after a set number of cycles, this normally leads to multiple convergence points.

How do they Compare..?

Operating Principle

GA is by definition, an inter-life algorithm, which means that the approach requires individuals to ‘die’ in order to progress. RL is intended to be an intra-life learning algorithm, with many recently developed methods targeting the issue of continual learning and “safe RL”.

Fundamentally, the operating principles of the two approaches are different. RL uses Markov decision processes, whereas GA is largely based on heuristics. The value function update in RL is a gradient-based update, whereas GAs generally don’t use such gradients.

Problems they are suited for

RL is a Machine Learning concerned with a specific type of optimization problem, i.e. finding policies (strategies) that maximize the return, while an agent interacts with an environment in time steps.
On the other hand, GAs are self-learning algorithms that can be applied to any optimization problem where you can encode solutions, define a fitness function that compares solutions and you can stochastically change those solutions. Essentially, GAs can be applied to almost any optimization problem. In principle, you could use GAs to find policies, as long as you’re able to compare them with a fitness function.

This doesn’t mean GA is better, it only means that if a better solution doesn’t exist GA would be your go to. While RL is a potent niche for problems requiring sequential decision making in an environment.

Drawbacks

Genetic Algorithm

Requires less info about the problem, but designing a fitness function and getting the representations and operations correct can be very daunting.
It is also computationally expensive.

Reinforcement Learning

Too much reinforcement learning can lead to an overload of states which can diminish the results.
This algorithm is not preferable for solving simple problems.
This algorithm needs a lot of data and a lot of computation.
The curse of dimensionality limits reinforcement learning for real physical systems.

An Ideal Approach..?

As we have already discussed apart from the fundamental difference, both approaches have their uses and short coming. While GA is more general purpose, defining a fitness function that suits the problem along with the right type of representation and operations is very difficult. Where as RL is best suited for solving problems that require sequential decision making, but requires a lot more data and isn’t great when the dimensionality of the problem is higher. RL models also tends to get tunnel visioned based on the early stages of the learning.

Given this,

GA is favoured when no other solution fits the mould, for obvious reasons.
For simpler problems, most of the time, RL is effective but generally more time consuming than GA, also the fitness function and representations for GA is easier to come up with, so either RL or GA can work depending on the problem.
When we have medium levels of complexities with high available data, RL is favoured.
With problems with higher complexities, both GA and RL take a lot of time, require complex representations or limited by the number of dimensions that need to be processed. In such cases, a combination of the two is more preferable than any, individually.

A Combination..?

Yes, a combination of Genetic Algorithm and Reinforcement learning is possible cause the two approaches aren’t mutually exclusive. Just like the two principles of nature they are derived from coexist, so can these approaches.

Reinforcement Learning enables agents to take decision based on a reward function. However, in the process of learning, the choice of values for learning algorithm parameters can significantly impact the overall learning process. Using a Genetic Algorithm to find the values of parameters used in the learning algorithm, let’s say Deep Deterministic Policy Gradient (DDPG) combined with Hindsight Experience Replay (HER), to help speed up the learning agent. Leading to better performance, faster than the original algorithms.
Another approach would be to take parts of Reinforcement Learning like the Agent-Environment relation and run multiple agents that can crossover and mutate similar to Genetic Algorithm.

In conclusion, sometimes you can take the road in the middle… :P