Instinct: neuro-evolution on steroids

Published in

Towards Data Science

10 min readAug 13, 2017

When I first implemented neuro-evolution, I implemented the algorithm as described in a paper by Kenneth O. Stanley and Risto Miikkulainen; Evolving Neural Networks through Augmenting Topologies. I managed to get it working, but as I tested the algorithm on more complex datasets, problems started to emerge.

The lack of node-connected biases forced networks to become overly large and complex for simple problems. Keeping track of so-called innovation numbers was computationally expensive. But most importantly: the algorithm could never produce any memory cells like we see in LSTM’s due to the lack of gates.

I quickly started modifying the algorithm, adding biases and gates, removing innovation numbers, but also adding the option for nodes to mutate their activation function: a very powerful tool, as genomes can now completely modify themselves for datasets/environments.

The Instinct algorithm

I like to call my algorithm the Instinct algorithm. Not just because it sounds cool, but also because neuro-evolution develops instincts of neural agents in certain environments.

The very essence of instinct is that it’s followed independently of reason. — Charles Darwin

I will merely discuss the structure of genomes and how they should be bred, mutated and constrained. Setting up the genetic algorithm is up to you. For a working example, see my Javascript neuro-evolution library.

Also note that this article will be updated when changes are made to the algorithm. For instance, I already have plans to include shared weights, which shall be added soon.

1 Genome

A genome is represented by an array of nodes and and an array of connections, as shown in figure 1.

(figure 1) Simple example of a genotype and corresponding phenotype

1.1 Nodes

Each node has a type. There are only three types: input, hiddenand output. Nodes with the input type always stick at the beginning of the list and nodes with the output type will stick at the end of the list. If new nodes get added through mutation, they will always have the hidden type.

The index parameter serves no real function, it just signifies the order in which the nodes are activated. This happens in a ascending order.

Each node has a modifiable bias and squash (or: activation function). Biases have no fixed range. The activation function is randomly chosen for every newly mutated node.

1.2 Connections

Connections are connected from the node with index from to the node with index to. Even though a connection references indices, it actually references nodes to which it is connected. So if there is a connection between nodes on index 2 and 3 for example, the connection always references the nodes that were originally on index 2 and 3.

Each connection also has its own weight and gater. Weights have no fixed range. The gater parameter represents the index of a node in the node genes list that gates the connection, it always references the node regardless of its index.

1.3 Activation

The method of activating the genome is exactly the same as other neural networks, but i’d like to put some (JS-like) pseudocode here just in case. Remember that self connections are multiplied by the node’s previous state and not the node’s previous activation.

FUNCTION ACTIVATE: 
  output = []  FOR EACH node IN genome.nodes:
    IF node.type === 'INPUT':
      node.activation = input[i]
    ELSE:
      // self-connections are handled differently!
      node.state = node.selfconnection.weight * node.state *
        genome.nodes[node.selfconnection.gater].activation +
        node.bias      FOR EACH connection IN node.incomingConnections:
        node.state += connection.weight *
          genome.nodes[connection.from].activation * 
          genome.nodes[connection.gater].activation
      
      
      node.activation = node.squash(node.state)
      
      IF node.type === 'OUTPUT':
        output.add(node.activation)

2 Crossover

Crossover is a key to the improvement of genomes. When two genomes are selected for crossover, they develop a unique offspring. It might create a very fit offspring from two less fit parents. It stimulates the creation of new network topologies in the population.

One general law, leading to the advancement of all organic beings, namely, multiply, vary, let the strongest live and the weakest die. — Charles Darwin, The Origin of Species

First, the size of the offspring gets determined. The size of a genome is equal to the amount of nodes the genome has. If one of the parents has a higher fitness than the other, the size will be the same as the size of the fittest parent. If the parents are equally fit, a random size is chosen between the sizes of the parents.

IF parent1.fitness > parent2.fitness:
  offspring.size = parent1.size
ELSE IF parent2.fitness > parent1.fitness:
  offspring.size = parent2.size
ELSE:
  offspring.size = randomBetween(parent1.size, parent2.size)

Regardless of which parents is fitter, nodes will get chosen uniformly from each parent for each index. This is done until the amount of nodes is equal to the previously determined size.

When the size of the offspring is larger than the smallest parent, the nodes from the biggest parent will always be chosen.

There are two exceptions to these rules: first of all, a node with type output may never be selected if the output nodes for the offspring are not yet being selected. Second of all, when the nth output node is being selected for the offspring, then the nth output node of one of the parents is chosen.

FOR i = 0, i < offspring.size, i++:
  IF i < offspring.size - parents.outputSize:
    // Choose non-output nodes
    IF i > parent1.size - parents.outputSize:
      node = parent2.nodes[i];
    ELSE IF i > parent2.size - parents.outputSize:
      node = parent1.nodes[i]
    ELSE:
      node = SELECT_UNIFORM(parent1.nodes[i], parent2.nodes[i])
  ELSE:
    // Choose output nodes
    node = SELECT_UNIFORM(
      parent1.nodes[parent1.size + i - offspring.size],
      parent1.nodes[parent2.size + i - offspring.size]
    )
 
  offspring.nodes[i] = node;

Once nodes have been selected, the connections are split into two distinct groups: common and extra connections. When two parents have a connection with the same from and to values, it’s a common connection. Otherwise it is an extra connection, meaning the connection only exists in óne of the parents.

The most efficient way of

Common nodes are chosen uniformly from both parents regardless of their fitness. An extra gene from one of the parents only gets added if that parent is at least equally as fit as the other parent.

3 Mutation

Unlike NEAT paper, I will describe mutation methods that should be used to get maximum efficiency from the genomes. Each description is accompanied by some psuedocode.

3.1 Add node mutation

Having a mutation method that adds a node gene to a genome is essential. Without this mutation method, the genome will not grow and thus never improve after the nodes have been saturated with connections.

When a new node is added, an existing connection is split into two new connections with the new node between those connections.

(figure 2) Example of the add node mutation, a new node between 1 and 2 is added

If the connection that is being removed has a gater, the gater is transferred to one of the two new connections. When a new node is inserted into the gene list of a genome, it is placed before the receiving node (node with index to) of the original connection and never after any node with the output type.

When a node is inserted between other nodes, the index of all nodes after the inserted node is increased by one. So existing connections always remain between the nodes they were assigned to originally.

New nodes are always assigned a uniformly chosen activation function.

connection = SELECT_UNIFORM(genome.connections)
gater = connection.gaterindex = MIN(connection.to, genome.size - genome.outputSize)
node = new Node()genome.nodes.insert(node, index)
genome.connections.remove(connection)newConnection1 = {
  from: connection.from, 
  to: index, 
  weight: randomNumber(),
  gater: -1
}newConnection2 = {
  from: index, 
  to: connection.to, 
  weight: randomNumber(),
  gater: -1
}IF gater !== -1:
  SELECT_UNIFORM(newConnection1, newConnection2).gater = gatergenome.connections.add(newConnection1)
genome.connections.add(newConnection2)

3.2 Add connection mutation

Equally as important as mutating new nodes is mutating new connections. First, a list of connections which do not exist yet is generated. Once the list has been generated, one of the generated connections is placed in the genome’s connection gene list.

(figure 3) Example of the add connection mutation, a new connection between 1 and 3 is added

Connections from an hidden or output type node are never connected to an input type node, as the activation value of input type nodes are not calculated, but supplied by the dataset. Creating these connections would invoke unnecessary memory.

pairs = []FOR EACH node1 IN genome.nodes:
  FOR EACH node2 IN genome.nodes:
    IF node1.isNotConnectedTo(node2):
      pairs.add([node1, node2])pair = SELECT_UNIFORM(pairs)connection = {
  from: pair[0].index,
  to: pair[1].index,
  weight: random(),
  gater: -1
}genome.connections.add(connection)

3.3 Add gate mutation

Having a mutation method that sets the gaters of connection genes is necessary to develop complex networks that detect patterns between multiple, sequential inputs.

A connection is selected uniformly and the gater value is set to the index of a uniformly chosen node from the genome’s node gene list.

(figure 4) Example of the add gate mutation, connection from 1 to 3 is gated

connection = SELECT_UNIFORM(genome.connections)
connection.gater = selectionUniform(genome.nodes).index

3.4 Modify weight mutation

A connection is chosen uniformly and a value is added to or subtracted from the connection’s weight. I like to fix the modification to the range of (-1, 1) .

(figure 5) Example of the modify weight mutation

connection = SELECT_UNIFORM(genome.connections)
modification = random() * (max - min) + minconnection.weight += modification

3.5 Modify bias mutation

A node is chosen uniformly and a value is added to or subtracted from the node’s bias. I like to fix the modification to the range of (-1, 1) . Make sure that you don’t mutate input nodes as this has no effect on the output of the genome.

(figure 6) Example of the modify bias mutation

node = SELECT_UNIFORM(genome.nodes) // exclude input nodes
modification = random() * (max - min) + minnode.bias += modification

3.6 Modify squash mutation

Last but certainly not least: a mutation method that modifies the activation function of a node gene. Personally, I stick to this list of activation functions:

Logistic, Hyperbolic Tangent, Identity, Binary Step, ReLU, Softsign, Gaussian, Sinusoid, Bent Identity, Bipolar, Bipolar Sigmoid, Hard Hyperbolic Tangent, Inverse, SELU.

If you seek more diverse activation functions, I recommend this list.

(figure 7) Example of the modify squash mutation

node = SELECT_UNIFORM(genome.nodes)
newSquash = SELECT_UNIFORM(listOfSquashes)node.squash = newSquash

3.7 Remove node mutation

A genome might perform significantly better as it grows, but sometimes it benefits the genome to revert mutations that have occured. This also reduces the computational cost of activating the genome.

(figure 8) Example of the remove node mutation, node on index 2 is getting removed

First of all, nodes with input or output type are never removed. Once a node with type hidden is randomly selected for removal, a list is created with the node indices which source all the incoming connections to the node (excluding self-connections). Another list is created with node indices which are targeted by all outgoing connections. At the same time, the gater values which are not -1 of incoming and outgoing connections are saved.

node = SELECT_UNIFORM(genome.nodes)gaters = [];
sourceNodes = [];FOR EACH connection IN node.incomingConnections:
  IF connection.from !== node.index:
    sourceNodes.add(connection.from)
    IF connection.gater !== -1:
      gaters.add(connection.gater)targetNodes = [];FOR EACH connection IN node.outgoingConnections:
  IF connection.to !== node.index:
    targetNodes.add(connection.to)
    IF connection.gater !== -1:
      gaters.add(connection.gater)

Now every node index in the sourceNodes array gets connected with every node index in the targetNodes array (if not already connected). A list is kept of the created connections. After all the connections have been created, every gater in gaters is assigned to a connection, until no connections are left to gate.

newConnections = [];FOR EACH source IN sourceNodes:
  FOR EACH target IN targetNodes:
    IF source.notConnectedTo(target):
      connection = {
        from: source,
        to: target,
        weight: random(),
        gater: -1
      }
      newConnections.add(connection)
      genome.connections.add(connection)FOR EACH gater IN gaters:
  IF newConnections.length === 0:
    break;
  
  randomConnection = SELECT_UNIFORM(newConnections);
  
  randomConnection.gater = gater
  newConnections.remove(randomConnection)

The gater value of every connection that is gated by the node which is being removed is set to -1. Finally, the node is removed from the genome.

FOR EACH connection IN genome.connections:
  IF connection.gater === node.index:
    connection.gater = -1genome.nodes.remove(node)

3.8 Remove connection mutation

Connections can only be removed if the source node has multiple outgoing connections and if the target node has multiple incoming connections. That way all nodes will always have at least óne incoming and óne outgoing connection.

(figure 9) Example ofthe remove connection mutation, connection between 1 and 3 is being removed

connections = [];FOR EACH connection IN genome.connections:
  IF connection.from.hasMultipleOutgoingConnections() AND
     connection.to.hasMultipleIncomingConnections():
    connections.add(connection)connection = SELECT_UNIFORM(connections)
genome.connections.remove(connection)

3.9 Remove gate mutation

Gates that have been mutated previously might create a problem for the genome’s performance. This mutation method removes a gate from a randomly chosen gated connection.

(figure 10) Example of the remove gate connection, connection from 1 to 3 is getting ungated

gatedConnections = [];FOR EACH connection IN genome.connections:
  IF connection.gater !== -1:
    gatedConnections.add(connection)connection = SELECT_UNIFORM(gatedConnections)
connection.gater = -1

3.10 Other mutation methods

Sometimes it makes sense to split the add connection mutation method into multiple other mutation methods: add feedforward connection, add recurrent connection and add self-connection. This would also require remove feedforward connection, remove recurrent connection and remove self-connection.

For example, you do not want any recurrent mutations to occur when the dataset has no spatial relations (relations between multiple samples).

4 Constraints

Just like any other training method, the Instinct algorithm has a chance of over-fitting the data set. To avoid over-fitting, a new global parameter is introduced: growth.

The growth parameter penalizes genomes that are gaining size, but a relatively low amount of performance. Typically, growth is a decimal number. The amount of nodes, connections and gates is multiplied by growth to get the fitness penalty of the genome.

penalty = (genome.nodes.length + genome.connections.length + genome.gates.length) * growth

The parameters used to calculate the penalty are free of choice. For instance, the penalty calculation could also be:

penalty = genome.nodes.length * growth

4.1 Example

A genome has a fitness of 10 on a certain data set. The genome gets mutated; it gains an extra node. Suddenly, the fitness has increased to 11 . Sadly, the final fitness decreases to 9 as the following penalty is applied:

fitness = fitness - genome.nodes.length * growth

With a growth value of 2 . So if a genome wants to reach a higher final fitness, the fitness increase should be higher than 2.