Lipizzaner: a framework for co-evolutionary distributed GAN training

Lipizzaner applies lessons learned from (co-)evolutionary algorithms to define a resilient method to train generative adversarial networks (GANs). In this post, I will introduce its algorithm and I will discuss some published results.

I am starting a series of blog posts that will cover the research carried out about evolutionary computing and co-evolution applied to generative adversarial networks (GAN) training, which is the main research during my postdoc at MIT CSAIL (ALFA team). Lipizzaner framework has been extensively used and extended during our research. This framework was introduced in a previous post.

Basic GAN training

There is plenty of information about using GANs to train generative models. Thus, I will introduce the basic concepts needed to better understand our approach.

The general GAN setup comprises two models, represented by neural networks named generator (G) and discriminator (D), that are involved in a competitive game. These two players have different roles:

  • The generator tries to produce data samples (x’) that come from a probability distribution, which represents the training data set (consisting x of samples).
  • The discriminator judges if the input comes from the generator (synthesized or fake sample) or from the training data set (real sample), i.e., it tries to distinguish between x’ and x.
Figure 1. Representation of GAN training process.

Thus, during the gameplay, the generator tries to fool the discriminator (i.e., maximizing the probability of making the discriminator believe that the samples produced by the generator are real), while the discriminator tries to distinguish between the real images from the training data set and the synthetic ones from the generator. The game is defined as a minimax optimization problem, in which the networks optimize their weights.

Lipizzaner GAN training

Evolutionary algorithms (EA), widely used methods to deal with optimization problems, evolve a population of solutions (individuals) in order to converge to the best (or good enough) solutions according to a fitness (quality) function. Co-evolutionary algorithms (COEA) present a similar framework. However, there is more than one population and the individuals’ quality is evaluated according to its interaction with other individuals of the same or other populations. Lipizzaner applies both EA and COEA during the GAN training.

Distributed Coevolutionary GAN Training

The main feature of Lipizzaner is to define a game between two populations, one of generators and one of discriminators, instead of only one generator and one discriminator. This allows the use of different tools and strategies already applied to COEA that have shown high success when addressing minimax optimization.

Figure 2. Single-GAN vs Spatially Distributed COEA GAN training.

Another important breakthrough that differentiates Lipizzaner over other COEA GAN training approaches is the distribution of the individuals of the populations on a spatial grid (see Figure 2). Each cell contains two competitive coevolving sub-populations, which are defined according to the overlapping Moore neighborhoods (i.e., collecting individuals from the adjacent cells located in the North, East, South, and West). Each cell asynchronously executes the training method that updates the networks’ weights applying stochastic gradient descendent (SGD), as most of the machine learning methods. At the end of each training epoch, the sub-populations are updated with copies of the best neural networks (generator and discriminator) of the neighbors. Figure 3 summarizes the creation of the sub-populations and the training method executed in each cell.

Figure 3. Lipizzaner GAN training method on a given cell (neighborhood).

Hyper-parameters evolutionary-based optimization

The learning rate hyper-parameter is updated applying an evolutionary approach to better adapt it to the current situation of the training process.

Evolutionary generation of ensembles

In machine learning, the use of ensembles or mixture of predictors demonstrates better results than a single predictor for many tasks. Lipizzaner takes advantage of training sub-populations of generators to return a generative model that consists of a mixture of the generators of the best sub-population (see the video for a better understanding). This mixture is defined according to a set of mixture weights. Each mixture weight, that belongs to each generator, is treated as a probability of using that generator versus the other generators to create the fake samples. Thus, they determine the performance of the whole generative model when evaluating the quality of such models.

An evolutionary algorithm, Evolutionary Strategies (ES) in this case, is applied to evolve the mixture weights to optimize the performance of the generative model. Thus, for each cell, Lipizzaner returns the generative model consisting of the sub-population of generators and the best mixture weights and the sub-population of discriminators at the end of the run (See Figure 4).

Figure 4. Outline of the main output of Lipizzaner for each cell after training the networks and the mixture weights.

The following video summarizes the training algorithm applied by Lipizzaner.

Main advantages

Lipizzaner brings a robust and resilient GAN training method mainly because it fosters diversity through the populations of generators and discriminators during the whole process. The main sources of diversity are:

  • The continuous exchange of neighbors slows down the convergence of the algorithm and increases the diversity inside the sub-populations.
  • The pairwise training inside of each cell can involve different individuals (generator and discriminator) each training epoch.
  • Each training epoch, the networks are randomly assigned a loss function to use during the GAN training to promote diversity. This is a new feature presented in Mustangs.
  • Each neighborhood can evolve in semi-isolation and more diverse points in the search space are explored.
  • The asynchronous parallelism allows the sub-populations in a cell co-evolve without waiting for the other cells to finish its generation (training epoch). Thus, there may be individuals from different generations in a given neighborhood.

Lipizzaner improves computational efficiency over other methods because the use of a toroidal grid is a useful means of controlling the mixing of adversarial populations on COEAs, limiting the number of interactions between individuals. If the neighborhood size is s and each population has N individuals, this reduces the cost of overall interaction from O(N²) to O(Ns).

Ensembles of generators have shown to be an effective way to address mode collapse. Following this idea, Lipizzaner returns a mixture of (ensemble) generators with evolved weights that optimize the quality and the diversity of the generated samples.

Its object-oriented design really facilitates extending the framework to use our own models (generators and discriminators), datasets, loss functions, fitness functions (to evaluate the quality of generators), etc.

Some results

In order to show the performance of Lipizzaner, we have trained a generative model created by using a basic multilayer perceptron (MLP) to be trained over MNIST. We show the results in terms of Fréchet Inception Distance (FID). We ran Lipizzaner by using different grid sizes (populations sizes) to see the impact of increasing the population sizes, and therefore, the diversity in the populations.

Figure 5 shows the evolution of the median FID score of the population. We observe that when we increase the grid size, the generative converges faster and converges to solutions that represent generative modes with higher accuracy (better quality samples).

Figure 5. Evolution of the median FID score through the training epochs when using Lipizzaner with different grid sizes.

Figure 6 illustrates the final FID score obtained by the best generative model for 30 independent runs. It is clear the importance of increasing the grid size to get better results.

Figure 6. Final FID results of 30 independent runs.

More results can be found in further blog posts, the project website, and in the following publications.

Publications

  1. T. Schmiedlechner, I. Ng Zhi Yong, A. Al-Dujaili, E. Hemberg, U. O’Reilly. Lipizzaner: A System That Scales Robust Generative Adversarial Network Training. In NeurIPS 2018 Workshop on System for Machine Learning, 2018. https://arxiv.org/abs/1811.12843
  2. A. Al-Dujaili, T. Schmiedlechner, E. Hemberg, U. O’Reilly. Towards distributed coevolutionary GANs. In AAAI 2018 Fall Symposium, 2018. https://arxiv.org/abs/1807.08194
  3. J. Toutouh, E. Hemberg, and U. O’Reilly. Spatial Evolutionary Generative Adversarial Networks. In Genetic and Evolutionary Computation Conference (GECCO ’19), July 13–17, 2019, Prague, Czech Republic. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3321707.3321860
  4. J. Toutouh, E. Hemberg, U. O’Reilly. Data Dieting in GAN Training. H. Iba, N. Noman (Eds.), Deep Neural Evolution — Deep Learning with Evolutionary Computation, pages 19, Springer, 2020, Springer. https://arxiv.org/abs/2004.04642
  5. J. Toutouh, E. Hemberg, U. O’Reilly. Re-purposing Heterogeneous Generative Ensembles with Evolutionary Computation. In Genetic and Evolutionary Computation Conference (GECCO ’20), pages. 10, 2020. DOI: 10.1145/3377930.3390229
  6. J. Toutouh, E. Hemberg, U. O’Reilly. Analyzing the Components of Distributed Coevolutionary GAN Training. In The Sixteenth International Conference on Parallel Problem Solving from Nature (PPSN XVI). pages. 10, 2020.

MIT postdoc researcher. Inpassion for deep learning, GANs and evolutionary computing; and how to apply them to address real-world problems and Climate Change