Automatic Design of Artificial Neural Networks with Evolutionary Optimization Methods

14 min readDec 10, 2023

Every time you hear about the power and the potential risks of Artificial Intelligence, you should know behind AI and Data Science Models there are two well-established branches of Mathematics: Statistics and Optimization. In this work, I show how Mathematical Optimization may be used not only to train some of these AI models, but also to build their architecture.

On Building Artificial Neural Networks

Inspired by the functioning of the brain, Artificial Neural Networks are some of the most used tools for data-related business purposes. In general, there are two ways of automatically designing artificial neural networks: parametric learning and structural learning. With the risk of over-simplifying, we may say that in parametric learning gradient-descent-based optimization techniques are used to find the optimal network parameters (e.g., weights). Concerning structural learning, we can consider three models:

constructive algorithms. Start with a small network and train it until it is unable to continue learning, then add new components. This process is repeated until a satisfactory solution is found.
destructive algorithms (a.k.a. pruning algorithms). Start with a big network, which is able to learn but usually ends in over-fitting, and try to remove connections and nodes that are not useful.
evolutionary computation. It is a family of algorithms for global optimization inspired by biological evolution. It is used to evolve neural network architectures and weights.

In the literature, it can be shown that there is a correspondence between evolution and gradient descent. The main advantage of evolutionary computation is it performs a global exploration of the search space to avoid becoming trapped in local minima, as it usually happens with the gradient-based approaches, in addition to enabling massive parallelization.

Evolutionary Computation

In the 1980s, the notion of an artificial neural network (ANN), a rough abstraction of the brain, was well established (Yao 1999, Shi 2008). Since then, there has been a great interest in exploiting evolutionary optimization methods to build artificial neural networks, mainly inspired by the evolution of the brain in nature.

ANN design using evolutionary techniques can be roughly classified into three levels:

evolving connection weights,
evolving architecture,
evolving learning rules.

Both standard evolutionary techniques (single-population evolution) and coevolutionary techniques (two or more species reciprocally affect each other’s evolution) have been introduced in ANNs design. The resulting respective techniques are defined as neuroevolutionary and neurocoevolutionary.

Neuroevolutionary algorithms evolve ANNs using standard evolutionary techniques. These standard techniques consist of a population of individuals and an evolution cycle (Figure 1) repeated until some stopping criteria is met (e.g. when the individuals are sufficiently close to the global optimum of a function). The cycle is composed of:

fitness assessment (each individual’s fitness is computed, e.g., a function value given the individual as input);
selection (those with the best fitness are selected, while the others are discarded);
crossover (the previously selected individuals are coupled, parents, to generate new individuals, offsprings, sharing similar features with their parents);
mutation (for each individual, its features may be slightly transformed);
new population generation (either these generated individuals completely substitute or are added together to their parents).

Depending on the algorithm, there might be a few modifications, such as dropping crossover or mutation process for one or more iterations. With this iterated cycle the population gradually evolves to increasing levels of fitness. Once some termination criteria are met, e.g., the maximum number of iterations, the iterative process is stopped, and the best solution is extracted.

In the ANNs design, conventionally, each individual in the population represents a network.

Differently, neurocoevolutionary algorithms evolve solutions by decomposing complete networks into subpopulations. By coevolution, we mean the process of evolving two or more individuals that are so highly connected that they influence each other. Even though these algorithms have few differences from the standard evolutionary ones in terms of evolution cycle (still the one shown in Figure 1), the power of neurocoevolutionary models stays in the cooperation or competition between subpopulations. Each individual of a population represents a partial network. Then, a complete network is composed through building relationships among individuals.

In this article, we see different methods from both neuroevolutionary and neurocoevolutionary classes. We will refer mainly to Shi 2008.

Standard Neuro-Evolution (NE)

Standard Neuro-Evolution (NE) is the first form of artificial intelligence that uses standard evolutionary algorithms to generate ANNs. This method applies to small, fixed-topology complete neural networks, such as the one shown in Figure 2.

Figure 2: Example of a simple fully connected feedforward neural network, an individual in Standard Neuro-Evolution algorithm. The network contains 5 nodes (2 input nodes, two hidden nodes, 1 output node) and 6 edges.

Only the connection weights are evolved. Each individual in the population is a vector with all connection weights of the network, and each gene of an individual specifies a weight value between two neurons. In Figures 3 and 4 we show an example of encoding and the corresponding neural network, respectively. This is not the unique possible encoding, graphs can be represented in many possible ways.

Figure 3: Example of encoding for a simple fully connected feedforward neural network in Standard Neuro-Evolution. The network contains 5 nodes and 6 edges. The first tile indicates a weight of value 4 connecting nodes 1 and 3. Analogously, the other tiles.

Figure 4: Resulting neural network encoded by the string shown in Figure 3.

Figure 5 shows the resulting representation of the neural networks population.

Figure 5: Example of population of simple fully connected feedforward neural networks in Standard Neuro-Evolution.

In the selection phase of the evolution cycle, NE selects the best-performing individuals in the population and breeds them in the next generation (Figure 6).

Figure 6: Standard Neuro-Evolution cycle.

With this algorithm, the population tends to lose diversity during the evolution process and eventually converges around a single “type” of individual, which is undesirable for two reasons:

Populations often converge on suboptimal peaks,
converged populations cannot adapt well to changes in the task environment.

Slight modifications of the algorithm hyperparameters cannot solve this limitation. Some improvements have to be made at a higher level.

Neuro-Evolution of Augmenting Topologies (NEAT)

Neuro-Evolution of Augmenting Topologies (NEAT) is a more recent neuroevolutionary technique that evolves both the connection weights and connection topologies of ANNs simultaneously. Oppositely to NE, this method is not restricted to fixed-topology neural networks.

The length of an individual (encoding) is flexible and expandable so that evolution is able to both complexify and simplify network topologies.

A NEAT individual consists of:

neuron genes (Figure 7): each neuron is given an ID and a type (e.g., input, output, hidden, …);
link genes (Figure 8): each link gene contains: the two endpoints of the edge (in-node neuron and out-node neuron), the connection weight, the connection status (i.e., if the link is enabled or not), an innovation number used for crossover (it tracks when the link was introduced, essential to limit the Competing Conventions Problems, a well-known problem of neuroevolution: in principle, different network encodings may represent the same network. This is a problem because differing representations of the same structure are highly likely to produce damaged offspring during crossover. The innovation number establishes an ordering of the elements (genes) of the encoding vector).

Figure 7: Example of neuron genes in NEAT.

Figure 8: Example of link genes in NEAT.

The union of neuron genes and link genes shown in Figures 7 and 8 encodes the network shown in Figure 9.

Figure 9: Resulting neural network encoded by the strings shown in Figures 7 and 8.

To guarantee a flexible length of the individuals (oppositely to Standard NE), NEAT supports many possible mutations on nodes, edges, and weights:

addition of a new node (by splitting an existing edge into two consecutive ones);
addition of a new edge with a random weight;
connection weight modification;
enabling or disabling of an edge.

Even if NEAT evolves both the connection weights and connection topologies of ANNs simultaneously, they are not represented by separate populations. Instead, the simultaneous evolution of weights and topology is obtained by classifying networks into species. A species is a group of population individuals with similar characteristics, reproductively isolated from other species, which means crossover can only happen with individuals of the same species. This speciation (species detection) prevents a species from taking over the entire population. Figure 10 shows the complete NEAT cycle.

Symbiotic Adaptive Neuro-Evolution (SANE)

Symbiotic Adaptive Neuro-Evolution (SANE) is a neurocoevolution model that evolves three-layer networks, where the number of hidden nodes of the network is predefined and fixed, but the network connection topology is not. Figures 11, 12 show two examples of networks that can be evolved by SANE. Note that the network topology is not the same.

Figure 11: Example of a network that can be evolved by SANE model. There are 1 input, 1 hidden, and 1 output layer, with a fixed number of hidden nodes (in this case 1).

Figure 12: Example of a network that can be evolved by SANE model. There are 1 input, 1 hidden, and 1 output layer, with a fixed number of hidden nodes (in this case 1).

There are two populations evolved in SANE, a population of neurons and a population of network blueprints (note here the term specialization is used rather than species since each neuron does not represent a full solution to the problem). The neuron population searches for effective partial networks, while the blueprint population searches for effective combinations of partial networks.

Each individual in the neuron population represents a hidden neuron in a two-layer feed-forward network. More precisely, it represents a set of connections (labels and weights) of a hidden neuron to be made from the input layer or to the output layer in a neural network. It is represented by a fixed-length gene (vector of alpha-numeric entries), in which each couple of entries represents the connection path (i.e., which input/output neuron to connect to the hidden one) and the weight of that connection (Figure 13). All the hidden neurons have the same number of connections (so the dimension of the gene is always the same) but could have different connections from the input layer and the output layer.
Given the neurons with their connections and weights, we need to combine them to construct the network. This is achieved by the blueprint population, which contains the information for the neurons combination. Each individual of the blueprint population specifies a series of pointers to specific neurons from the first population that are used to build a complete neural network (Figure 14).

Figure 13: Example of encoding for the individual of the neuron population (hidden neuron) in SANE shown in Figure 11. The hidden neuron is connected to 3 input nodes and two output nodes. Each labeled edge is featured with a weight.

Figure 14: Example of connection between network blueprints population and neurons population in SANE. Each individual of the blueprint population contains pointers to neurons.

In the initial phase of the evolution process, the series of pointers of the blueprint population are randomly assigned to individuals in the neuron population. By evolving the blueprint population, effective neuron combinations can be maintained and new combination forms can be explored. Note that, through maintaining a combination, the neurons (well-contributing neurons) in the combination are maintained too. This means that well-contributing neurons are protected from being removed during the neuron population evolution process.

As we saw previously (Figure 6), Stage 1 in standard neuroevolution models corresponds to the single neural network evaluation, with individuals being neural networks. Differently, in SANE the first stage is subdivided into three substeps:

Substep 1. Neurons are combined in resulting networks according to blueprint individuals;
Substep 2. Evaluation of networks;
Substep 3. Normalization of the fitness. After evaluating each individual of the blueprint population, each neuron receives a normalized fitness based on the performance of the networks in which it participates.

We define symbiotic evolution as a type of coevolution where individuals explicitly cooperate and rely on the presence of other individuals to survive. It differs from most coevolutionary methods, where individuals compete rather than cooperate to survive.

Cooperation only happens during the evaluation phase, while the two populations perform recombination and mutation processes independently. For more details on crossover and mutation in SANE, we suggest reading paper (Moriarty and Miikkulainen, 1997). The process is summarized in Figure 15.

Enforced Sub-Populations (ESP)

Just like SANE, Enforced Sub-Populations (ESP) is a neurocoevolutionary algorithm, where the individuals are neurons and they are put together to form a complete network. Similarly to SANE, ESP evolves three-layer fully connected networks, but, oppositely, it creates a subpopulation for each hidden neuron.

While SANE is an intra-population neurocoevolutionary algorithm (that is, all the cooperative neurons come from the same population), ESP is an inter-population neurocoevolutionary algorithm (cooperative individuals come from different populations), so that the neurons in each subpopulation can evolve independently, and rapidly specialize into good network sub-functions. This specialization allows the algorithm to search more efficiently than SANE, and also to evolve recurrent networks.

Each subpopulation individual represents a weight vector of a hidden neuron fully connected with input neurons and output neurons.
A complete network is built by selecting an individual from each subpopulation. Figure 16 shows the graph representation of three individuals from the neuron population (the vector encoding is analog to that for SANE). They are put together in the final neural network shown in Figure 17.

Figure 16: Graph representation of three ESP individuals from the neuron population, each of them belonging to a distinct subpopulation.

Figure 17: Example of neural network generated with the ESP model by combining the three hidden nodes shown in Figure 16.

The ESP evolution steps (Figure 18) are described below.

Initialization. The number of hidden units h is specified and h subpopulations of n neurons are created. Every individual in each subpopulation encodes a specifically positioned hidden node in the hidden layer of the final network, with its connections and weights.
Combination into a network. A set of h neurons is randomly selected, one from each subpopulation, to form the hidden layer of a graph.
Evaluation of the network. The built graph is evaluated on the task and receives a fitness score.
Combination and Evaluation are repeated for a fixed number of times t (i.e. a fixed number of networks).
Fitness normalization. Each individual of the neuron population receives a normalized fitness (again similarly to SANE).
Recombination. For each subpopulation, neurons are ranked by fitness. Those with the highest score are used as starting points for generating new subpopulations, via crossover and mutation. In this way, both optimal weights and network topologies can be explored simultaneously.
Check Stagnation. If the best-performing network fitness has not improved in b generations, mutation is applied. Eventually, if this doesn’t improve fitness, the network size is adapted (by adding or removing hidden neurons).

Neurons are evolved in their own subpopulations but cooperatively adapt to the problem domain. The cycle is repeated until a network that performs sufficiently well in the task is found or the maximum number of generations is reached.

Despite they both evolve neurons, one important difference between SANE and ESP is that for the latter the neuron population is partitioned in subpopulations, so that a neuron evolves inside their group, but also with a focus on how well they contribute to the overall network performance. In contrast, SANE forms networks that can contain multiple nodes with the same specialization and omit nodes with different specialization. More details can be found in the work by Gomez and Miikkulainen, 2003.

Evolving Efficient Connections (EEC)

Borrowing many ideas from SANE and ESP, Evolving Efficient Connections (EEC) is a more recent neurocoevolutionary algorithm, able to be applied to different types of networks too. We consider here the same networks of ESP, i.e., those with one hidden layer. While ESP decomposes a network search space into several neuron subspaces, EEC separates the search space of a network into two subpopulations: connection weights space and connection paths space. The population of connection weights evolves edges weights vectors for fully connected networks, while the population of connection paths evolves networks topology (as a sequence of edges).
Figures 19 and 20 show two individuals from the two subpopulations. They are put together to form the graph in Figure 21.

Figure 19: Example of a connection weights individual for the EEC model.

Figure 20: Example of a connection paths individual for the EEC model.

Figure 21: Example of neural network generated with the EEC model by combining the two individuals shown in Figures 19 and 20.

The steps (Figure 22) of the algorithm are:

Initialization. As in ESP, the number of hidden neurons h is specified and the two subpopulations are created.
Combination and Evaluation. A network is built by sampling an individual from each of the two populations and combining them. This is done for a fixed number of times so that every individual of the two populations is used at least once. These networks are evaluated and the individuals receive a fitness score averaged on the number of networks in which they participated. Then, instead of moving to recombination, new networks are built and evaluated (and the related individuals too) starting from the previous evaluation: in a few words, the so far best-performing individuals are selected to build the network for the next evaluation. This is repeated a fixed number of times.
Check Stagnation. As for ESP, if the best-performing network fitness has not improved in b generations, mutation and eventually network size adaptation are considered.
Recombination. Similarly to the ESP case, for each subpopulation, individuals are ranked according to their average fitness. Those with the highest score are used as starting point for generating the new populations at the next generation, via crossover and mutation. In this way, both optimal weights and network topologies are explored simultaneously.

The cycle is repeated until a network that performs sufficiently well in the task is found or the maximum number of generations is reached.

DeepNEAT

DeepNEAT model was proposed to extend NEAT to deep neural networks.
It follows the same process as NEAT, with the fundamental difference that while in NEAT individuals are vectors of neurons and connections between neurons, in DeepNEAT each gene in the vector represents a layer in a deep neural network. This makes it possible to discover deeper models.

Each layer is characterized by a set of features related to the type of layer (such as convolutional, fully connected, or recurrent) and its properties (e.g. number of neurons, kernel size, and activation function). Link genes are no longer representing connection weights, but simple connections between layers. Figure 22 shows an example of such an individual. Each vector also contains a set of network hyperparameters, such as learning rate, data preprocessing, and training algorithm.

Figure 23: Example of an individual in DeepNEAT. Image Credits to: Costa et al.

During the fitness evaluation, each individual is converted into a deep neural network, which is then trained for a fixed number of iterations (epochs), and evaluated on the specific task. The fitness value is then assigned to the corresponding vector.

The evolution cycle is the same as the one for NEAT. For each cycle, specific mutations and crossover are applied to individuals.

Conclusions

This article shows some of the best-known evolutionary computation models used to evolve artificial neural network architectures and weights. Starting from a common evolution cycle, neuroevolution and neurocoevolution models differ in the way neural networks are built: generally, neuroevolution individuals are entire neural networks, while in the neurocoevolutionary case they are partial graphs that have to be combined properly.

One of their hottest applications is on the automatic design of Generative Adversarial Deep Neural Networks, those architectures that are useful for synthetic data generation, but pretty often exploited to generate fake contents too. For more on this use case, take a look at my blog publications! :)