Intelligence is in my genes: Genetic algorithms, part I

Published in

The Sound of AI

6 min readMay 24, 2019

Hello again, cross-validators. For the last few weeks we’ve been discussing Artificial Neural Networks (ANNs) and how to train them. People often conflate the more general idea of AI with the more specific terms ‘machine learning’ or ‘deep learning’. It’s important to realise the differences and limitations of any AI model, so that you know what they’re capable of and where to effectively apply them. With that in mind, this week we’ll move on to a new kind of AI algorithm: Genetic Algorithms (GAs).

*The internet: a sophisticated cat-sharing technology.*

Searching for… something?

GAs fall into the search or optimisation categories of AI, where, generally speaking, ANNs can’t be used. Sure, they can be used as part of a search solution (classifying web pages based on their content, for instance). But the process of conducting a search or optimising a set of parameters is not one that lends itself well to the architecture of an ANN.

This is because ANNs are essentially statistically-driven input-output mapping machines. Put in an image and the ANN can tell you if it’s a cat or not (most likely yes if you found the picture on the internet). But try to ask an ANN to find a cat on the internet and you get stuck. What are the inputs/outputs? How will you train it? This problem becomes worse when the problem is so ill-defined that you’re not exactly sure of what you’re looking for (finding that perfect cat gif for your team’s #random channel on Slack), or when the number of options to search through grows unreasonably high (my Google search for ‘cat’ just returned 7.5 billion results, which is more than the population of the planet).

*There’s one less cat on the internet. RIP Grumpy Cat. Image:* *NYT*

The knapsack problem

Now the cat’s out of the bag, let’s illustrate the issue with a classic example of a search/optimisation problem: the knapsack problem.

The knapsack problem goes like this: imagine you’re a travelling salesperson who carries all of your wares in your trusty knapsack. Naturally, you want to make sure that you make the most money you can that day by taking your most valuable goods on your round. But you have a problem — you can only carry so much or your bag will break! Every morning, you need to find the best combination of items to put in your bag that will maximise the value you carry, but not go over the bag’s capacity so it breaks.

*The knapsack problem is a classic search/optimisation problem that can’t be solved with ANNs. Image:* *Wikipedia*.

Let’s begin tackling this problem by revising a couple of AI search terms we covered way back at the start of this Code of AI series:

Search space — abstract parameters constraining a search.
States — positions or sets of conditions in a space.
Goal state — a desirable outcome, or solution.

The search space in the knapsack problem is every possible combination of items to put in the bag. Each item has an associated value and weight, and the bag itself has a maximum capacity. A single state is one of those combinations of items in the bag, and the goal state is the one single combination that maximises value without going over the bag’s capacity.

A brute force solution

The knapsack is quite a simple problem to solve with a brute force method. In this code snippet we iterate through all the possible combinations of the items in the bag and test each one for its value and weight.

Running this code shows that we do indeed find a solution, and for the 10 items in this example we find the solution quite quickly. But what happens if there are more items to search through? With 10 items we only need to search through around 1000 combinations, and it takes about 7ms on my machine. But with 20 items the number of combinations grows to around 1,000,000 and takes 2.5 seconds. Just five more (25) items makes around 33 million combinations and takes around 2 minutes to search. This trend shows that the knapsack problem has exponential (O(2^n)) complexity, which means that it quickly becomes impractical for larger values of n.

GAs: a different kind of search

Let’s think of a better way to search the vast space of the knapsack problem. The problem we’re facing lies in the exponential rise in the number of combinations when we increase the number of items. What if, instead of searching through every possible combination of items in the bag, we have a fixed number of states, evaluate their value, and then change them in some way to explore the search space.

This is exactly the intuition behind a GA. Inspired by natural evolution — the process by which a simple organism becomes more complex and better adapted to their environment over time through natural selection — GAs evaluate a set of states (a population in GA terms) for their fitness and evolve the best solutions (“fittest”) in the next generation. Eventually, the overall fitness of the population will rise and a best-fit solution to the problem will be found.

We’ll get more into the mechanics of the GA search process next time. For now, let’s focus on a few fundamental concepts in GAs, all of which are inspired by natural evolution: genes, chromosomes, phenotypes and fitness functions.

You can think of a phenotype as analogous to a search state in general AI terms. In biology, you are a living phenotype, and your genes define how you appear. If you have the brown-eyed gene then your phenotypic eyes will be brown too. Similarly in a GA, a gene is an abstract parameter that encodes one aspect of the phenotype. A chromosome is the collection of genes that together fully represent the phenotype in an abstract way. Finally, the fitness function is an application-specific evaluation method that creates a phenotype from a chromosome and computes how good the solution is.

The knapsack as a chromosome

In our knapsack example we’ve already said that a single search state — a phenotype — is one possible combination of items in the bag. Encoding this into a chromosome is as simple as creating a binary list where the length is the number of possible items to put in the bag. If a single gene takes a value of one, it’s included in the bag.

To illustrate this, here’s some example code that creates a random chromosome and converts it into a phenotype.

It’s in the bag

That’s where we’ll leave it for this week. We’ve covered some of the limitations of machine learning and ANNs, and the need for a wider perspective on some AI problems. We also introduced genetic algorithms as an alternative way to search a complex space, and some fundamental concepts in GAs, genes, chromosomes, phenotypes and fitness functions.

I’ve provided some code for a knapsack problem chromosome. If you’re up for a challenge, can you take some code from the bruteForceKnapsack function (above), and create a fitness function to evaluate a single chromosome?

As always, you can find the source code for all the above examples on our GitHub.

To begin your AI-coding training at day one, go here. And give us a follow to receive updates on our latest posts.