Instinct: neuro-evolution on steroids
When I first implemented neuro-evolution, I implemented the algorithm as described in a paper by Kenneth O. Stanley and Risto Miikkulainen; Evolving Neural Networks through Augmenting Topologies. I managed to get it working, but as I tested the algorithm on more complex datasets, problems started to emerge.
The lack of node-connected biases forced networks to become overly large and complex for simple problems. Keeping track of so-called innovation numbers was computationally expensive. But most importantly: the algorithm could never produce any memory cells like we see in LSTM’s due to the lack of gates.
I quickly started modifying the algorithm, adding biases and gates, removing innovation numbers, but also adding the option for nodes to mutate their activation function: a very powerful tool, as genomes can now completely modify themselves for datasets/environments.
The Instinct algorithm
I like to call my algorithm the Instinct algorithm. Not just because it sounds cool, but also because neuro-evolution develops instincts of neural agents in certain environments.
The very essence of instinct is that it’s followed independently of reason. — Charles Darwin
I will merely discuss the structure of genomes and how they should be bred, mutated and constrained. Setting up the genetic algorithm is up to you. For a working example, see my Javascript neuro-evolution library.
Also note that this article will be updated when changes are made to the algorithm. For instance, I already have plans to include shared weights, which shall be added soon.
1 Genome
A genome is represented by an array of nodes and and an array of connections, as shown in figure 1.
1.1 Nodes
Each node has a type. There are only three types: input
, hidden
and output
. Nodes with the input
type always stick at the beginning of the list and nodes with the output
type will stick at the end of the list. If new nodes get added through mutation, they will always have the hidden
type.
The index parameter serves no real function, it just signifies the order in which the nodes are activated. This happens in a ascending order.
Each node has a modifiable bias and squash (or: activation function). Biases have no fixed range. The activation function is randomly chosen for every newly mutated node.
1.2 Connections
Connections are connected from the node with index from
to the node with index to
. Even though a connection references indices, it actually references nodes to which it is connected. So if there is a connection between nodes on index 2 and 3 for example, the connection always references the nodes that were originally on index 2 and 3.
Each connection also has its own weight and gater. Weights have no fixed range. The gater
parameter represents the index of a node in the node genes list that gates the connection, it always references the node regardless of its index.
1.3 Activation
The method of activating the genome is exactly the same as other neural networks, but i’d like to put some (JS-like) pseudocode here just in case. Remember that self connections are multiplied by the node’s previous state and not the node’s previous activation.
FUNCTION ACTIVATE:
output = [] FOR EACH node IN genome.nodes:
IF node.type === 'INPUT':
node.activation = input[i]
ELSE:
// self-connections are handled differently!
node.state = node.selfconnection.weight * node.state *
genome.nodes[node.selfconnection.gater].activation +
node.bias FOR EACH connection IN node.incomingConnections:
node.state += connection.weight *
genome.nodes[connection.from].activation *
genome.nodes[connection.gater].activation
node.activation = node.squash(node.state)
IF node.type === 'OUTPUT':
output.add(node.activation)
2 Crossover
Crossover is a key to the improvement of genomes. When two genomes are selected for crossover, they develop a unique offspring. It might create a very fit offspring from two less fit parents. It stimulates the creation of new network topologies in the population.
One general law, leading to the advancement of all organic beings, namely, multiply, vary, let the strongest live and the weakest die. — Charles Darwin, The Origin of Species
First, the size of the offspring gets determined. The size of a genome is equal to the amount of nodes the genome has. If one of the parents has a higher fitness than the other, the size will be the same as the size of the fittest parent. If the parents are equally fit, a random size is chosen between the sizes of the parents.
IF parent1.fitness > parent2.fitness:
offspring.size = parent1.size
ELSE IF parent2.fitness > parent1.fitness:
offspring.size = parent2.size
ELSE:
offspring.size = randomBetween(parent1.size, parent2.size)
Regardless of which parents is fitter, nodes will get chosen uniformly from each parent for each index. This is done until the amount of nodes is equal to the previously determined size.
When the size of the offspring is larger than the smallest parent, the nodes from the biggest parent will always be chosen.
There are two exceptions to these rules: first of all, a node with type output
may never be selected if the output nodes for the offspring are not yet being selected. Second of all, when the n
th output node is being selected for the offspring, then the n
th output node of one of the parents is chosen.
FOR i = 0, i < offspring.size, i++:
IF i < offspring.size - parents.outputSize:
// Choose non-output nodes
IF i > parent1.size - parents.outputSize:
node = parent2.nodes[i];
ELSE IF i > parent2.size - parents.outputSize:
node = parent1.nodes[i]
ELSE:
node = SELECT_UNIFORM(parent1.nodes[i], parent2.nodes[i])
ELSE:
// Choose output nodes
node = SELECT_UNIFORM(
parent1.nodes[parent1.size + i - offspring.size],
parent1.nodes[parent2.size + i - offspring.size]
)
offspring.nodes[i] = node;
Once nodes have been selected, the connections are split into two distinct groups: common and extra connections. When two parents have a connection with the same from
and to
values, it’s a common connection. Otherwise it is an extra connection, meaning the connection only exists in óne of the parents.
The most efficient way of
Common nodes are chosen uniformly from both parents regardless of their fitness. An extra gene from one of the parents only gets added if that parent is at least equally as fit as the other parent.
3 Mutation
Unlike NEAT paper, I will describe mutation methods that should be used to get maximum efficiency from the genomes. Each description is accompanied by some psuedocode.
3.1 Add node mutation
Having a mutation method that adds a node gene to a genome is essential. Without this mutation method, the genome will not grow and thus never improve after the nodes have been saturated with connections.
When a new node is added, an existing connection is split into two new connections with the new node between those connections.
If the connection that is being removed has a gater, the gater is transferred to one of the two new connections. When a new node is inserted into the gene list of a genome, it is placed before the receiving node (node with index to
) of the original connection and never after any node with the output
type.
When a node is inserted between other nodes, the index of all nodes after the inserted node is increased by one. So existing connections always remain between the nodes they were assigned to originally.
New nodes are always assigned a uniformly chosen activation function.
connection = SELECT_UNIFORM(genome.connections)
gater = connection.gaterindex = MIN(connection.to, genome.size - genome.outputSize)
node = new Node()genome.nodes.insert(node, index)
genome.connections.remove(connection)newConnection1 = {
from: connection.from,
to: index,
weight: randomNumber(),
gater: -1
}newConnection2 = {
from: index,
to: connection.to,
weight: randomNumber(),
gater: -1
}IF gater !== -1:
SELECT_UNIFORM(newConnection1, newConnection2).gater = gatergenome.connections.add(newConnection1)
genome.connections.add(newConnection2)
3.2 Add connection mutation
Equally as important as mutating new nodes is mutating new connections. First, a list of connections which do not exist yet is generated. Once the list has been generated, one of the generated connections is placed in the genome’s connection gene list.
Connections from an hidden
or output
type node are never connected to an input
type node, as the activation value of input
type nodes are not calculated, but supplied by the dataset. Creating these connections would invoke unnecessary memory.
pairs = []FOR EACH node1 IN genome.nodes:
FOR EACH node2 IN genome.nodes:
IF node1.isNotConnectedTo(node2):
pairs.add([node1, node2])pair = SELECT_UNIFORM(pairs)connection = {
from: pair[0].index,
to: pair[1].index,
weight: random(),
gater: -1
}genome.connections.add(connection)
3.3 Add gate mutation
Having a mutation method that sets the gaters of connection genes is necessary to develop complex networks that detect patterns between multiple, sequential inputs.
A connection is selected uniformly and the gater
value is set to the index of a uniformly chosen node from the genome’s node gene list.
connection = SELECT_UNIFORM(genome.connections)
connection.gater = selectionUniform(genome.nodes).index
3.4 Modify weight mutation
A connection is chosen uniformly and a value is added to or subtracted from the connection’s weight. I like to fix the modification to the range of (-1, 1)
.
connection = SELECT_UNIFORM(genome.connections)
modification = random() * (max - min) + minconnection.weight += modification
3.5 Modify bias mutation
A node is chosen uniformly and a value is added to or subtracted from the node’s bias. I like to fix the modification to the range of (-1, 1)
. Make sure that you don’t mutate input nodes as this has no effect on the output of the genome.
node = SELECT_UNIFORM(genome.nodes) // exclude input nodes
modification = random() * (max - min) + minnode.bias += modification
3.6 Modify squash mutation
Last but certainly not least: a mutation method that modifies the activation function of a node gene. Personally, I stick to this list of activation functions:
Logistic, Hyperbolic Tangent, Identity, Binary Step, ReLU, Softsign, Gaussian, Sinusoid, Bent Identity, Bipolar, Bipolar Sigmoid, Hard Hyperbolic Tangent, Inverse, SELU.
If you seek more diverse activation functions, I recommend this list.
node = SELECT_UNIFORM(genome.nodes)
newSquash = SELECT_UNIFORM(listOfSquashes)node.squash = newSquash
3.7 Remove node mutation
A genome might perform significantly better as it grows, but sometimes it benefits the genome to revert mutations that have occured. This also reduces the computational cost of activating the genome.
First of all, nodes with input
or output
type are never removed. Once a node with type hidden is randomly selected for removal, a list is created with the node indices which source all the incoming connections to the node (excluding self-connections). Another list is created with node indices which are targeted by all outgoing connections. At the same time, the gater
values which are not -1
of incoming and outgoing connections are saved.
node = SELECT_UNIFORM(genome.nodes)gaters = [];
sourceNodes = [];FOR EACH connection IN node.incomingConnections:
IF connection.from !== node.index:
sourceNodes.add(connection.from)
IF connection.gater !== -1:
gaters.add(connection.gater)targetNodes = [];FOR EACH connection IN node.outgoingConnections:
IF connection.to !== node.index:
targetNodes.add(connection.to)
IF connection.gater !== -1:
gaters.add(connection.gater)
Now every node index in the sourceNodes
array gets connected with every node index in the targetNodes
array (if not already connected). A list is kept of the created connections. After all the connections have been created, every gater in gaters
is assigned to a connection, until no connections are left to gate.
newConnections = [];FOR EACH source IN sourceNodes:
FOR EACH target IN targetNodes:
IF source.notConnectedTo(target):
connection = {
from: source,
to: target,
weight: random(),
gater: -1
}
newConnections.add(connection)
genome.connections.add(connection)FOR EACH gater IN gaters:
IF newConnections.length === 0:
break;
randomConnection = SELECT_UNIFORM(newConnections);
randomConnection.gater = gater
newConnections.remove(randomConnection)
The gater
value of every connection that is gated by the node which is being removed is set to -1
. Finally, the node is removed from the genome.
FOR EACH connection IN genome.connections:
IF connection.gater === node.index:
connection.gater = -1genome.nodes.remove(node)
3.8 Remove connection mutation
Connections can only be removed if the source node has multiple outgoing connections and if the target node has multiple incoming connections. That way all nodes will always have at least óne incoming and óne outgoing connection.
connections = [];FOR EACH connection IN genome.connections:
IF connection.from.hasMultipleOutgoingConnections() AND
connection.to.hasMultipleIncomingConnections():
connections.add(connection)connection = SELECT_UNIFORM(connections)
genome.connections.remove(connection)
3.9 Remove gate mutation
Gates that have been mutated previously might create a problem for the genome’s performance. This mutation method removes a gate from a randomly chosen gated connection.
gatedConnections = [];FOR EACH connection IN genome.connections:
IF connection.gater !== -1:
gatedConnections.add(connection)connection = SELECT_UNIFORM(gatedConnections)
connection.gater = -1
3.10 Other mutation methods
Sometimes it makes sense to split the add connection mutation method into multiple other mutation methods: add feedforward connection, add recurrent connection and add self-connection. This would also require remove feedforward connection, remove recurrent connection and remove self-connection.
For example, you do not want any recurrent mutations to occur when the dataset has no spatial relations (relations between multiple samples).
4 Constraints
Just like any other training method, the Instinct algorithm has a chance of over-fitting the data set. To avoid over-fitting, a new global parameter is introduced: growth.
The growth
parameter penalizes genomes that are gaining size, but a relatively low amount of performance. Typically, growth
is a decimal number. The amount of nodes, connections and gates is multiplied by growth
to get the fitness penalty of the genome.
penalty = (genome.nodes.length + genome.connections.length + genome.gates.length) * growth
The parameters used to calculate the penalty are free of choice. For instance, the penalty calculation could also be:
penalty = genome.nodes.length * growth
4.1 Example
A genome has a fitness of 10
on a certain data set. The genome gets mutated; it gains an extra node. Suddenly, the fitness has increased to 11
. Sadly, the final fitness decreases to 9
as the following penalty is applied:
fitness = fitness - genome.nodes.length * growth
With a growth
value of 2
. So if a genome wants to reach a higher final fitness, the fitness increase should be higher than 2
.