Building Generative Adversarial Networks with Evolutionary Optimization Methods: COEGAN Model

Published in

Data Reply IT | DataTech

13 min readDec 15, 2023

Privacy is an important concern for our society where sharing data is a frequent occurrence. Synthetic data are a promising concept toward solving the conflict between data privacy and data sharing. The goal of synthetic data generation is to create an as-realistic-as-possible dataset, not only maintaining the characteristics of the original data but doing so without the risk of exposing sensitive information (Park et al. 2018, Boedihardjo et al. 2022). The degree to which synthetic data are similar to the original ones is a measure of the quality of these generation models (Synthetic Data).

The generation process, also called synthesis, can be performed using different techniques.

Regarding the sensitive information problem, some of the techniques used to achieve privacy are to remove identifiers, modify quasi-identifiers, and perturb values. Unfortunately, these approaches are not free from limitations. Private information can still be extracted if attackers possess some form of background knowledge. Moreover, these techniques don’t take into account the impact they have on the released data. You can find more details about synthetic data and synthesis models in the nice work Pills of AI 101: The Synthetic Data | by Jeremy Sapienza | Data Reply IT | DataTech | Medium.

Deep Learning emerged as an astonishing tool for preserving privacy. Generative Adversarial Networks (GANs, first introduced in 2014) present state-of-the-art results in the generation of samples following the distribution of the input dataset. However, GANs are difficult to train, and several aspects of the architecture have to be designed by hand. As we will see in the following, there are still open problems regarding the training of GANs: vanishing gradient and mode collapse.

In this article, we show how neuroevolution optimization methods can be combined with GANs to provide a more stable training method and the automatic design of neural network architecture. The derived model is called COEGAN, first presented in Costa et al. 2019.

Generative Adversarial Networks

Introduction to GANs

Generative Adversarial Networks (GANs) (Goodfellow et al. 2014) gained relevance for presenting impressive results, mainly for image synthesis in the field of computer vision (A Gentle Introduction to GANs, Synthetic data generation using Generative Adversarial Networks (GANs): Part 2 | by Mahmood Mohammadi | Data Science at Microsoft | Medium).

GANs belong to the bigger family of generative models, unsupervised learning models that summarize the distribution of input variables and use it to generate new data that plausibly could have been drawn from the original distribution (i.e. from the original dataset).
But one doesn’t have to get confused. GANs are actually formulated as supervised processes, not unsupervised ones as is typical of generative models, because of the presence of a classification submodel. In other words, GANs translate the unsupervised problem into a supervised one with two sub-models:

Generator. It is a generative model G that we train to capture the data distribution and generate new samples. In general, a generative model focuses on learning the distribution of individual classes in an input dataset, in this way it is capable of creating new realistic samples.
Discriminator. The discriminator D is nothing but a trainable evaluation metric that tells how good/bad is the quality of the generated data. It estimates the probability that a sample came from the training data rather than G, so it tries to classify examples as either real (i.e. from the domain) or fake (i.e. generated by the generating model).

The two models are trained together, as adversaries in a zero-sum game (where one agent’s gain is another agent’s loss). This justifies the name adversarial model.

Figure 1: Example of a zero-sum game. Image credits to: Youtube.

The training terminates (i.e. the game converges) when both G and D reach a point where changing their actions (updating the weights of neural networks) does not bring more benefits (or the loss functions for G and D cannot be further minimized).
This point is the Nash equilibrium for the following equation:

This equation shows that G tries to minimize the loss function while D tries to maximize it. But which are their own loss functions? To answer this, we need a glimpse of the model architecture.

Model Architecture

Let’s explore the GAN architecture considering the real-world-like image generation problem.

Figure 2: GAN Structure. Image Credits to: Overview of GAN Structure | Machine Learning | Google for Developers

The input for the generator model is a fixed-length vector z, also called latent random vector, randomly sampled from a uniform or Gaussian distribution (z’s entries may be termed noise). It represents a compressed version of image features, being (synthetic) images the output, G(z). The generator’s goal is to generate fake images, belonging to the real-world image distribution, starting from the noise.

The discriminator’s inputs are both the real dataset and the generated images. Its goal is to classify them as real, D(G(z))=1, and fake, D(G(z))=0, respectively. Broadly speaking, the generator learns how to fool the discriminator, which, in turn, learns to not get fooled.

Loss Optimization & Training

Figure 3: Visualization of a generic loss function. Image credits to Medium.

Intuitively, the discriminator D wants D(G(z)) to be 0, so it tries to maximize (1-D(G(z))), whereas the generator G wants D(G(z)) to be 1, so it tries to minimize (1-D(G(z))).

Formally, the generator G minimizes a one-part loss:

On the other hand, the discriminator D maximizes a two-term loss, where the second part corresponds to the generator’s loss:

The training algorithm is an alternative training process based on the backpropagation technique.

Alternative training means the generator and discriminator are trained in one loop, one after the other. In this way we train the discriminator for one step (using a batch of input data), keeping the generator frozen, and then train the generator for one step (using a batch of latent vectors), keeping the discriminator frozen, repeating these steps until the model converges.

In order to train a neural network model using the backpropagation method, we need to calculate the error value of the model’s output (prediction error) using a loss function and then update the model weights by propagating the gradients across the neural layers, aiming to minimize the error value.

The steps used to train the discriminator are the following.

Take a batch of fake samples, generated from the generator.
Take a batch of real samples.
For each sample, fake or real, the discriminator tries to predict its label.
Evaluate the loss function. The error is then backpropagated to the discriminator architecture, stopping before reaching the generator architecture, practically frozen.

On the other hand, training the generator is more complicated, since it is not directly connected to the output.

Draw a batch of input vectors from the starting distribution.
Take a batch of generated fake samples.
The discriminator classifies these fake samples as fake or real.
Evaluate the loss function of the complete model generator + discriminator, then the error is backpropagated to the generator layers while the discriminator is frozen (does not update). In this way, the loss function penalizes the generator for samples the discriminator classifies as fake. This is the critical point where the generator weights are updated to generate samples that can fool the discriminator as real samples.

As we previously said, the main problem is twofold:

vanishing gradient,
mode collapse.

The vanishing gradient occurs when the discriminator becomes powerful enough to not be fooled by the generator anymore, preventing the gradient from flowing through the generator. This causes the whole training progress to stagnate. On the other hand, the mode collapse problem makes the generator capture only a portion of the input distribution.
Several approaches tried to minimize these problems, but they remain unsolved. More precisely, some variations of the original GAN were proposed in order to improve the stability and the performance of the model, however, a study found no empirical evidence that those proposals are superior to the original GAN.

Recently, some models were proposed to use evolutionary algorithms in GANs and overcome these problems.

Automatic Design of Artificial Neural Networks with Evolutionary Optimization Methods

This part partially refers to a more detailed work: Automatic Design of Artificial Neural Networks with Evolutionary Optimization Methods | by Andrea Bertolini | Dec, 2023 | Medium.

Among the different ways of automatically designing artificial neural networks, evolutionary computation is used to evolve neural networks architectures and weights and has the main advantage of performing a global exploration of the search space to avoid becoming trapped in local minima, as it usually happens with the gradient-based approaches, in addition to enabling massive parallelization.

Evolutionary Computation

Since the 1980s, there has been a great interest in exploiting evolutionary optimization methods to build artificial neural networks, mainly inspired by the evolution of the brain in nature. ANN design using evolutionary techniques can be roughly classified into three levels:

evolving connection weights,
evolving architectures,
evolving learning rules.

Neuroevolutionary algorithms evolve ANNs using standard evolutionary techniques. These standard techniques consist of a population of individuals and an evolution cycle (Figure 4) repeated until some stopping criteria is met (e.g. when the individuals are sufficiently close to the global optimum of a function). The cycle is composed of:

fitness assessment (each individual’s fitness is computed, e.g., a function value given the individual as input);
selection (those with the best fitness are selected, while the others are discarded);
crossover (the previously selected individuals are coupled, parents, to generate new individuals, offsprings, sharing similar features with their parents);
mutation (for each individual, its features may be slightly transformed);
new population generation (either these generated individuals completely substitute or are added together to their parents).

Different algorithms may have slight modifications, such as no crossover or mutation processes for some or all the iterations. With this iterated cycle the population gradually evolves to increasing levels of fitness. Once some termination criteria are met, e.g., the maximum number of iterations, the iterative process is stopped, and the best solution is extracted.

In the ANNs design, conventionally, each individual in the population represents a network.

Neuro-Evolution of Augmenting Topologies

Neuro-Evolution of Augmenting Topologies (NEAT) is a neuroevolutionary technique that evolves both the connection weights and connection topologies of ANNs simultaneously. The length of an individual (encoding) is flexible and expandable so that evolution is able to both complexify and simplify network topologies.

A NEAT individual consists of:

neuron genes (Figure 5): each neuron is given an ID and a type (e.g., input, output, hidden, …);
link genes (Figure 6): each link gene contains: the two endpoints of the edge (in-node neuron and out-node neuron), the connection weight, the connection status (i.e., if the link is enabled or not), an innovation number used for crossover.

Figure 5: Example of neuron genes in NEAT.

Figure 6: Example of link genes in NEAT.

The union of neuron genes and link genes shown in Figures 5 and 6 encodes the network shown in Figure 7.

Figure 7: Resulting neural network encoded by the strings shown in Figures 5 and 6.

To guarantee a flexible length of the individuals, NEAT supports many possible mutations on nodes, edges, and weights:

addition of a new node (by splitting an existing edge into two consecutive ones);
addition of a new edge with a random weight;
connection weight modification;
enabling or disabling of an edge.

Even if NEAT evolves both the connection weights and connection topologies of ANNs simultaneously, they are not represented by separate populations. Instead, the simultaneous evolution of weights and topology is obtained by classifying networks into species. A species is a group of population individuals with similar characteristics, reproductively isolated from other species, which means crossover can only happen with individuals of the same species. This speciation (species detection) prevents a species from taking over the entire population. Figure 8 shows the complete NEAT cycle.

DeepNEAT

DeepNEAT model was proposed to extend NEAT to deep neural networks.
It follows the same process as NEAT, with the fundamental difference that while in NEAT individuals are vectors of neurons and connections between neurons, in DeepNEAT each gene in the vector represents a layer in a deep neural network. This makes it possible to discover deeper models.

Each layer is characterized by a set of features related to the type of layer (such as convolutional, fully connected, or recurrent) and its properties (e.g. number of neurons, kernel size, and activation function). Link genes are no longer representing connection weights, but simple connections between layers. Figure 9 shows an example of such an individual. Each vector also contains a set of network hyperparameters, such as learning rate, data preprocessing, and training algorithm.

During the fitness evaluation, each individual is converted into a deep neural network, which is then trained for a fixed number of iterations (epochs), and evaluated on the specific task. The fitness value is then assigned to the corresponding vector.

The evolution cycle is the same as the one for NEAT. For each cycle, specific mutations and crossover are applied to individuals.

Evolutionary Algorithms & GANs

So far we considered evolution and coevolution for designing general artificial neural networks. We are now ready to focus on combining evolutionary algorithms with generative adversarial networks.

The first models trying to do this were presented starting in 2018. An interesting model is Coevolutionary Generative Adversarial Networks (COEGAN), introduced in 2019, which exploits neuroevolution and neurocoevolution in the GANs training phase. This method can discover efficient architectures in the Fashion-MNIST and MNIST datasets. Moreover, experimentally, COEGAN can be used as a training algorithm for GANs to avoid common issues, such as the mode collapse problem.

Coevolutionary Generative Adversarial Networks (COEGAN)

Coevolutionary Generative Adversarial Networks (COEGAN) is the model obtained by extending and adapting the DeepNEAT algorithm to the GANs training phase in a coevolution environment (Costa et al. 2019).

As for DeepNEAT, individuals are vectors of layers from deep neural networks. COEGAN considers three types of layers: linear, convolution, and transpose convolution. The convolution and transpose convolution layers only have their output size as a random parameter; consequently, being related to the previous layer’s size, the input size, the stride, and the kernel are automatically derived. Analogously, the random parameter for the linear layer is simply the number of output features. Each layer has one of the following activation functions: ReLU, LeakyReLU, ELU, Sigmoid, and Tanh. Therefore, individuals’ components only have the activation function, output features, and output channels subject to the mutation operation.

Since GAN architectures are composed of discriminators and generators subnetworks, a COEGAN population is made of two subpopulations: generators and discriminators. Figures 9 and 10 represent examples of a generator and a discriminator, respectively. The output of a generator subnetwork is a fake sample, with the same characteristics (i.e. dimension and channels) as a real sample.
The output of discriminator subnetworks is, instead, the probability of the input sample being a real sample.

Figure 9: Example of an individual in DeepNEAT and COEGAN. In the latter, this is an example of a generator individual. Image Credits to: Costa et al.

Figure 10: Example of an individual in DeepNEAT and COEGAN. In the latter, this is an example of a discriminator individual. Image Credits to: Costa et al.

Due to the high number of network parameters, e.g., weights and bias, evolving them would increase the computational complexity too much.
For this reason, COEGAN was thought to be more appropriate for the evolution of the neural network architecture. The parameters of the resulting neural networks are trained by the gradient descent method, instead of through evolutionary algorithms.

In a competitive coevolution environment, discriminators and generators must be paired to compute their fitness based on the task performance. Several approaches can be used to pair individuals, such as all vs all, random, and all vs best.
Experimental analyses showed the all vs all strategy is the most stable for COEGAN, helping to avoid common problems in the training of GANs.

Different types of fitness evaluation are made for generators and discriminators. For the latter, the fitness corresponds to the value of the respective two-term loss function from the theory of GANs (above). On the other hand, for generators, the theoretical one-term loss function does not represent a good measure of quality in this case. Therefore, FID (an alternative measure to compare the performance of the generative component of GANs) is used as generator fitness.

Inside each of the two subpopulations, just as in NEAT, individuals are partitioned into species containing individuals with similar network structures. The similarity criterion between individuals is based on the parameters that are subject to evolution. Hence, for what was said previously, we do not consider the weights and bias of each layer in this calculation. This speciation mechanism is applied in order to promote innovation in each subpopulation and prevent a species from taking over the entire population.

Since crossover brings too much instability into the system, the only recombination in COEGAN is mutation. It consists of the following possible operations: add a new random layer (linear or convolution for discriminators, linear or transpose convolution for generators), remove a random layer, and change an existing layer. Furthermore, specific attributes for layers can also be mutated.

The selection phase of COEGAN is based on that of NEAT, but weights and bias are neural network parameters trained via gradient descent, not evolved. They are appropriately managed during the selection process: these two parameters are kept for the next generation in case a layer isn’t mutated or layers involved in the mutation are compatible, hence the new individual will keep the training information from the previous generation. In all the other cases, these parameters are not copied in the selection process, so the layer has to be trained again.

You can access the COEGAN code at the link https://github.com/vfcosta/coegan to replicate the results (Figure 11) presented in Costa et al. 2019.

Figure 11: Samples generated by COEGAN trained on the MNIST dataset. Image Credits to: Costa et al.

Conclusions

Synthetic data are a promising concept toward solving the conflict between data privacy and data sharing. This article shows how population-based optimization methods can be successfully used to build generative adversarial networks, presenting state-of-the-art results in the generation of realistic data samples. After introducing GANs and Evolutionary Computation, we have followed the path for defining COEGAN, a quite recent neurocoevolutionary model for GANs, whose results on image generation are comparable to those of the best-known GANs architectures in the literature. The authors made available the code to replicate these figures too.

For more contents, take a look at our blog publications! :)