Genetic Programming for ANNs

Lets say we have a neural network with tens up to hundreds neurons. We give each of them a coordinate, a vector encoding its position in space. The input can be anything, like a screenshot from a game or some test images, and the output is the desired action or answer. Now, all we need is a function for each set of connections that will evaluate the coordinates and assign each connection an appropriate weight. The approach we decided to try for creating such functions is called genetic programming.
Searching for the best one
How to find the right function? Of course we could just generate random functions and hope for the right one to pop up. But we are in a hurry a bit — we want to explore the state space of those functions more intelligently and preferably converge to the right one at each step. There is a way how to do it and the nature found it a long time ago. Its called an evolution.

And believe it or not: we can apply evolution even for such things as sin(x1 + 5)/2 * tanh(y1).
Evolving a tree
The function inside a computer is represented by a structure called a tree. It can be described by one simple picture:

The leaves of this tree are called terminals in this case, these are the constants or variables. Iin this case it is the set [2.2, 7, 11, x, y]. Inside the inner nodes, there are functions for evaluating those terminals like +, * or cos(). For genetic programming, there are two important things:
- a fitness function — this is the way, how to evaluate a performance of a single tree
- mutation — this is the way how to modify a tree
Of course there are other important terms like an individual, a population, generations, crossover methods… if you don’t know anything about evolutionary algorithms in programming, I suggest you to go through some materials first. I will focus only on how we used those method for our project in Mathematica.
The almighty fitness function
Making a good fitness function may not be as simple as it sounds. The basic idea we are using now, is simply to sum up the reward coming from the A.L.E.. For example — you destroy an enemy space ship in the Space invaders, you get some points. These points then count as a number for a fitness function.

This is quite good: if everything works well, we can learn our controller to shoot enemy ships by maximizing a this function. A little more complicated situation is in an arkanoid game called Breakout:

Everybody knows this game. The goal is to keep a ball in the air by bouncing it back from the pad on the bottom of the screen. The problem is, that you don’t get the reward when you bounce the ball back — you only get some points by destroying some blocks. That is unfortunately a bit random thing to count, because it does not depend on the controller.
Mutation (safe and with no nuclear radiation)
But lets assume that we only have a reasonable game like Space invaders. Then for each function we run one gameplay — collect the input images, put them into our neural network (with weights evaluated by this function) and send the outcome actions back to the A.L.E.. The performance / fitness of this function is then saved and used when choosing the individuals for the next generation.

We start with random functions, but of course we do not want to generate new random function each time. Instead, we only want to slightly adjust the existing ones. This is called a mutation. For a tree (and an elementary mutation version) it means finding a branch and replacing it with a new one.
Conclusion
The method we are trying to evolve a controller able of playing Atari games consists of these main points:
- creating a neural network, where each neuron has its coordinate
- creating a weight matrix by evaluating these coordinates with a function
- creating this function at first by random
- counting the fitness of this function by running a gameplay
- comparing this function with the others, choosing the best ones
- evolving these functions long enough to find a one that is working
What do we need for it? As it is always in the evolution — quite a large portion of luck ☺
In the next article we plan to show you some specific parts of implementation of GP in Wolfram Mathematica.