Dealing With High Dimensional Input
There are many ways to learn neural networks depending on the kind of problem we are trying to solve. Some of them are known and tested — like the backpropagation algorithm — and used in the networks that we use everyday (see our previous article). But our problem is a bit more complicated than that.
How to learn a game
Our goal is to develop a controller for a real game, so we have to think about the right way to learn our network. But if someone showed you a screenshot from a game and asked you “Am I playing it right?”, you probably wouldn’t know the right answer. While playing games, one image is just not enough. In order to learn how to play it, you have to see at least a sequence of images or (preferably) the whole run of this game. That is to say — reward does not come immediately. It means that we cannot use the simple backpropagation algorithm and also that we are facing an example of reinforcement learning.

Reinforcement learning
Reinforcement learning (RL) is, in other words, delayed learning. When you have a weight matrix of ANN, we cannot adjust them after each action, because in that moment we just don’t know if it was a good choice or not. We have to wait for the game to end and then sum up the reward. But lets count — with a reduced input of size 32x42 as the first layer of our neural network, each of these inputs connected to all the neurons in the hidden layer (this number depends on settings, it can be e.g. 5x5) and this layer is also fully connected to an output layer which consists of at least 3 neurons for 3 actions — left, right and fire. In this case it means 33675 calculations for each step. With 60 Hz frequency we have 60 steps a second. So we have to look for the ways to speed it up and efficiently learn our network at the same time.
Compressing the weights

The thing we are trying to do in the area of NN all the time is to simulate natural processes that are going on in our bodies. And we can take inspiration from our own brain this time as well. It is composed of really big number of units, but it is still quite efficient. How is that possible? One of the “compression algorithms” that our brain uses, is closely connected with geometrical (or spatial) placement of neurons. It assumes that neurons close to each other also depends on each other and encode more or less the same kind of information. This means that instead of training each neuron itself, we can just use for evaluation of their coordinates something other, like a set of functions.
CPPN — CPPF
This approach is called Compositional pattern-producing network (CPPN). Instead of using a simple sigmoid output function, this kind of network uses more functions to produce specific types of patterns and regularities like symetry or repetition. Unfortunately, these networks are again usually quite complicated and slow, so we decided to try something a little bit different — simpler, faster, but (hopefully) still functioning. We are going to use only two or three functions. To do this, we need to borrow a method from another branch of artificial intelligence. It is called genetic programming and we are going to describe it further in our next article.
