What are Evolutionary algorithms?

Understanding Genetic evolutionary algorithm with example

Published in

Data Science in your pocket

4 min readMar 31, 2023

Now, this is something I explored while reading about reinforcement learning and appeared to be one of the most fascinating ideas I have come across in recent times.

Do you remember the famous concept of ‘Survival of the Fittest’ by Charles Darwin from your bio classes back in school?

Somewhat related to it is the idea of ‘Evolution’.

Survival of the Fittest describes the mechanism of natural selection by explaining how the best-adapted individuals are better suited to their environment. In short, organisms best adjusted to their environment are the most successful in surviving and reproducing. So the idea of Evolution is simple

The individuals who adapts the best to the environment survives
They breed to give birth to descendants that have their core characteristics but certain new features as well (how Humans are considered evolved version of monkeys). Hence, mutation hence at some level
These descendants are considered an ‘evolved’ version of their parents who have inherent characteristics that help them better to survive in their environment.

Back to technology, Evolutionary algorithms are a family of algorithms that tries to follow Charles Darwin’s theory of evolution mentioned above

This sounds weird !!

How do Machine Learning algorithms follow the idea of evolution?

So, usually in any known ML approach, be it Neural Networks, Logistic regression, etc, we

Take a model.
Improve the model overtime by punishing it for making any wrong estimates (the loss functions) which we call training

A very big limitation with this approach at times you might face is

Your loss function has to be differentiable (for the sake of gradient descent)
Can be highly time-consuming.

In evolutionary algorithms, we reverse the whole approach. So instead of improving one model over time

We start off with multiple models (consider this to be our population)
Select a few best models based on their performance (the fittest individuals are picked)
‘Breed’ is the best model to generate their descendants with slight mutation (the evolved descendants). These descendants become our population now
Continue the cycle until the whole population of models is performing similarly (hence no bunch of best models can be picked, as results have saturated)

If you observe the entire flow, we have overcome many things

No loss function is required.
No gradient descent is required for updating weights. Hence, they are even called gradient-free algorithms
We can get the best models in a comparatively short time span as what we are doing are just predictions and no training. And if you have ever trained any model, the training time is way longer than the inference time.

Wait a minute, I have a few doubts. What do we mean by the breeding of models? And mutation? Will answer everything shortly

Genetic evolutionary algorithm

One of the most known members of the evolutionary algorithms family, genetic algorithms work in the following way

Select a model family (say Neural Networks). Call it family X
Initiate multiple models for family X with a different set of parameters/weights. These weights can be taken as an alias for the population in the ‘evolution theory.
Get metrics for each of the models. Say the metrics you got (for some classification) as F1=[82%,80%,20%,30%,60%,50%,40%]. Do remember you haven’t trained the model at all, these are results without any training
Now, following the e-greedy policy (as we do in reinforcement learning), select ’N’ models. Say we picked up models with F1= [80%,82%,30%,60%]. Do note that we picked up a model with f1=30% as well due to the exploration part of the e-greedy policy
‘Breed’ these models to get descendants.

This is actually interesting. By breeding, we mean to generate a new set of weights using the existing set of weights. This can be done in multiple ways. One of the ways in which Genetic algorithm does is by mixing weights from different models together to form new combination of weights and then trying these weights with model of family X.
For ex: Model 1 parameters=[1,2,3,4] ; Model 2 parameters=[5,6,7,8]. New set of parameters we can get can be : [1,2,7,8], [1,6,7,4],[5,2,3,8], etc i.e. different permutations. Also do note that we are not changing the index of parameters, so if 1 is at index=0, in any new permutation, it should remain at the same place.

So we will be generating multiple sets of weights by different permutations of the selected models.

After breeding, we will get a set of descendants that can treat as a population now (as we did in the first step) and select ’N’ descendants based on the e-greedy policy based on metrics on sample data.
The cycle continues until the whole population metrics are almost similar (no further improvement).
Now, you can take the best model from this population group as your final model

Why do we need an e-greedy policy?

This is done so as to maintain the diversity of weights in descendants and we may not converge to a local minimum. When going greedy, we would select the models with the best metrics.

What about mutation?

So basically, mutation means slight changes from core characteristics. So, we would be adding a pinch of noise to the weights we get after breeding from different models.

So, that’s it. We will try out implementing a genetic algorithm in my next!