A Comprehensive Guide to Genetic Algorithms (and how to code them)
On the Origin of Genetic Algorithms
Original article by Rishabh Anand
In the mid 19th century, Charles Darwin postulated the theory of evolution and how it played a key role in enabling organisms to adapt to their environments through natural selection – a process where the fittest in a given population survive and live long enough to pass on their traits and characteristics to future generations to ensure their survival.
What’s happening currently?
Presently, Machine Learning (ML) has kicked off a new era of smarter machines capable of making better decisions compared to their rule-based counterparts from the late 90’s and early 2000’s.
Harnessing the sheer amount of computational power we now possess, ML algorithms, specifically deep neural networks, have leveraged our large pools of data, both structured and unstructured, to deliver insights, leads, predictions and much more with a high degree of accuracy.
State-of-the-art ML models can now classify, categorise, and generate data from scratch with a few hours/days of training. Deep Neural Networks have now proliferated into multiple domains, now being able to work with different data formats ranging from images to audio, many of such networks having surpassed human level capabilities in said areas. Below is an instance of an agent playing Atari games:
Enter, Reinforcement Learning
Recently, research organisations like OpenAI and DeepMind have been dabbling with a field of Machine Learning called Reinforcement Learning. It’s a system where an agent learns and improves over time whilst interacting with an environment by making mistakes and collecting the respective rewards (either positive or negative) for the respective states.
Where do Genetic Algorithms fit in?
In this ecosystem of smart agents trying to navigate environments, genetic algorithms form a small subset of the field where semi brute-force methods are applied to create a “fit” agent that is able to “survive” (remain on top). Why do I say semi brute-force? It’s because the parameters for the agent are randomly generated so there is no pre-defined set of test values that are used for the agent.
Genetic Algorithms (GA) work on the basic principles of evolution as it is a meta heuristic to natural selection and the various subprocesses that occur spontaneously. This involves incorporating the 4 fundamental steps of evolution:
Together, these 4 processes, when performed on a population of agents yield the fittest agent for the task being performed.
From this point onwards, when I mention the word “survive” or any variant of the word, I mean to say that the agent remains part of the top few agents that are viable enough to move on to the next generation.
This refers to how strong the agent is in completing the task at hand. It quantifies how capable an agent is and this increases the probability that it will crossover with another agent in the population to possibly create stronger offspring models with traits of both the parent models.
For different tasks, the fitness function undergoes slight modifications. However, in essence, its primary function is to play the role of a differentiator in the population that separates the stronger learners from the weaker learners.
This process is self-explanatory in that the top few models are allowed to remain in the population while the remaining weaker ones are discarded of as they serve no purpose.
This leaves us with empty slots in the population giving us space for the new stronger offspring on the remaining parent agents.
This process enables two strong parents to create offspring that have the traits of both parents increasing the chances of survival. It’s a best-of-both-worlds scenario.
During crossover, all the properties of one of the agents are split up or broken down into more fundamental units and half of them are exchanged with those of the other parent agent. Biologically speaking, a part of the genes from one parent are connected to another part of the genes from the other to create an offspring that has two sets of gene segments increasing its chances of survival in terms of fitness.
Similar to the evolutionary process, mutation plays a key role in introducing some randomness and stochasticity into the population. By doing so, the agents that survive better (higher fitness score) with this mutation are able to pass on this survival mutation trait to future generations to ensure their survival as well.
Similar to the Fitness function, the mutation function also needs to be finalised beforehand. This mutation can range from a point mutation that occurs in one specific location in the gene sequence (DNA) or in multiple locations.
Either way, mutation is an integral component of the evolutionary process as our model cannot survive for long without it.
What happens when an agent achieves the goal or objective at hand? To cater to such a situation, a stopping criteria is used. This function checks if any one of the agents has crossed a certain threshold value (hyper-parameter) or criteria in general that pertains to completing the task.
In most cases it’s a single minimum threshold value that an agent needs to cross in order to complete the activity or task. In others, there can be a range of criteria or a group of thresholds that the model needs to cross every generation to be called the fittest/one of the fittest in the population.
All together, these 4 processes enable the production of new, stronger, more viable offspring that are capable of surviving in an environment.
Learning through application
Now that we’ve completed the refresher on genetic algorithms, a coding project is the best way to apply whatever you have learnt and strengthen your understanding of the concepts used by GAs.
While studying about GA, I came across loads of interesting projects ranging from the famous traveling salesman problem to the knapsack problem. I even found myself discovering schedule-building agents that create the optimal schedule for events.
While all this did seem interesting, I wanted to take things to the next level. I created a network optimisation algorithm that takes an existing Neural Network architectures hyper-parameters and enhances it to perform better with higher accuracies on the testing set. You’d probably say this is discount NASNet, but this endeavour was slightly challenging, thereby making the final outcome more wholesome and fulfilling 🎉.
As always, the code for this article can be found here on my GitHub.
Note: I’m experimenting with the Polacode VS Code plugin that allows users to take aesthetic polaroid pictures of the selected code in the editor. If the code in the images is too small or jarring, please do tell me in the comments below so I can go back to using GitHub Gists.
For visibility purposes, I suggest reading the following part of the article on a large display such as a monitor or laptop.
As mentioned above, the primary objective is to take an existing Neural Network architecture and randomly evolves it over a certain number of generations to improve it. Here, I loosely use the word ‘evolves’ as in reality, genetic algorithms are basically performing a completely random action and pray that it improves something. I have used the Keras Machine Learning library for the project. I have set the accuracy threshold (the stopping criteria) to 99.5% (really ambitious on my part) and have taken 1000 instances from the 60K training and testing set of MNIST (so predictable).
Now, for the bigger picture, we have a
Network agent that comes up with a bunch of random hyper-parameters. There is a built-in function in Keras,
model.predict() , that calculates the average fitness score for each Network agent over the testing set and assigns that score as the accuracy for the secondary agent.
Over time, the agent that has the better hyper-parameters comes out on top and is identified as the one with the best set of model hyper-parameters.
Functions and Objects involved
Firstly, we begin with the
Network agent. Its main function is to randomly generate a set of hyper-parameters, which will later be plugged into the model design and tested on the subset of MNIST. Here the hyper-parameters I am choosing to randomise are really basic and not too complex (the learning rate, for example), but keep in mind that it can be done!
Moving on to the main event loop, we have the necessary functions such as
fitness(), selection(), crossover(), mutate() as discussed above. The
fitness() calculates the accuracy of the model with the random hyper-parameters. Some of the model architectures do not work and break the code due to dimensionality issues or other errors, which is why we place the training and evaluation functions in a
try-except block.. The
selection() function is pretty self-explanatory and simple to understand — the top 20% of the models are allowed to move on to the next generation.
The important functions to look out for are the
mutation() functions. The quality of the child agents depend on the the quality of these functions ie. it creates offspring that are almost completely different from the parents and adds randomness and novelty to the gene pool and population.
crossover() function, the hyper-parameters of the parents are split between the two child agents created. A proportion of the hidden units are assigned to the child agents. The function isn’t that fancy (kinda ironic based on what I mentioned earlier) but this is just a project to explain a concept and I wanted to keep things simple (also, I couldn’t think of ‘interesting’ crossover and mutation functions).
For the Mutation function, I’ve summed the original hidden units with a random integer between 0 and 100 (as I said, I couldn’t think of an interesting function). The chances of a mutation occurring should be less that 0.1. This controls the randomness to a small but certain extent.
Down below, the
main() function spawns agents into the environment. Every generation, the above-mentioned functions are run one after the other. At the end, the model that crosses the accuracy threshold of 99.5% is considered the best
The code snippets above are only parts of the entire thing. So I urge you to visit the repo to get the source code for the project (Stars are appreciated 😁). All together, the different components work together in unison to generate the fittest model possible to achieve the task at hand (in this case, achieve an accuracy of 99.5% and above — again really ambitious).
In a nutshell
It took me some time to wrap my head around the concepts mentioned above. Doing a coding project on the topic made things much more clearer as application of theory is certainly reinforcement to the learning process (pun intended).
Feel free to clone the project repo and experiment with different architectures, crossover functions, and mutation functions! Any interesting results? Drop them down in the comments below!
Note: If there are any errors in the code, please do highlight them here. Much appreciated!
I have decided to (probably) make this a series on Genetic Algorithms and their applications in the real world with code examples to accompany them (if time permits). If you have any suggestions on what you want to read or want to chat in general, don’t hesitate to drop a line! You can contact me on LinkedIn! I love to talk about technology and its application in the real world!
Shoutout to Sarvasv Kulpati for his feedback on the draft-work. Go check out his profile for articles on science, tech and philosophy!
Here are some of the other articles I’ve written on Machine Learning and things related to up-and-coming technology:
CatGAN: cat face generation using GANs
Detailed review of GANs and how to waste your time with them…
TensorFlow or TensorNo?
The beginner’s guide to Machine Learning with TensorFlow.
Introducing TensorFlow.js 🎉
Running ML models on the browser became so much more easier!
Making a Replier and Follow Bot for Twitter using Node.js
How to ‘wisely’ spend your time making this awesome thing.
Till then, I’ll see you in the next one!