Data Dieting or how to train GANs using less data

Jamal Toutouh
4 min readAug 29, 2020

--

The last co-evolutionary Generative Adversarial Networks (GAN) training methods have shown success in mitigating training pathologies. In this post, I am introducing Data Dieting or how spatial co-evolutionary GAN training allows training the different networks of a population (generators and discriminators) using different portions of the training dataset. The main idea is to reduce the computational costs (memory and time) by distributing different portions of the training dataset (a number of mini-batches) by the cells of the spatial grid defined by Lipizzaner.

In a previous post, I introduced Lipizzaner, a spatially distributed co-evolutionary GAN training framework. Co-evolutionary GAN training, based on co-evolutionary algorithms (CoEA), raises an arms race between two populations of neural networks, one of generators and one of discriminators. The main difference between Lipizzaner and other co-evolutionary approaches is that it distributes the individuals of both populations in a toroidal grid (see Figure 1). In each cell of this grid, it defines sub-populations of generators and discriminators by copying the individuals located at the overlapping Moore neighborhood (i.e., networks from the adjacent cells located in the North, East, South, and West). Figure 1 illustrates the sub-populations (G1,1 and D1,1) of the cell labeled as 1,1.

Figure 1. Illustration of overlapping neighborhoods on a toroidal grid as the one defined by Lipizzaner. Note how a cell update at neighborhood N1,2 can be communicated to N1,1 and N1,3 when they gather the networks of the neighbors.

The training process is carried out by each cell in parallel, each cell applies a competitive co-evolutionary training process to its sub-populations. An iteration consists of training two individuals (networks) with all the mini-batches of the training dataset applying stochastic gradient descent. Between training epochs, the sub-populations are reinitialized by requesting copies of the best neural network models from the cell’s neighborhood (i.e., it is requested that each neighbor send the best generator and the best discriminator of its sub-populations). Figure 1 illustrates how this communication is carried out. When the training process finishes the algorithm creates a mixture of generators by combining the generators of the best subpopulation. This process is deeply explained in my previous post.

The successful co-evolutionary GAN training, that relies upon multiple generators and discriminators, can be resource-intensive. A simple approach to reduce resource use during training is to use less data (fewer mini-batches). Following this idea, we defined Data-dieting in Lipizzaner, also known as Redux-Lipizzaner. Thus, different sub-population of GANs (cells) are trained on different subsamples of the training data set (see Figure 2). The use of less training data reduces the computational time and the storage requirements while depending on the ensemble of generators to limit possible loss in performance from the reduction of training data.

Figure 2. Illustration of how the training dataset is sampled to generate training data subsets to train the different neighborhoods on the grid (N1,1 and N1,3).

Redux-Lipizzaner takes advantage of the potential benefit of the implicit communication that comes from the training on overlapping neighborhoods and updating the cell with the best generator after a training epoch. Considering that the networks travel through the cells of the toroidal grid, they exchange the knowledge of being trained by different subsets of the data, that aggregated, potentially represent the whole training dataset.

In order to evaluate our approach, we carried out some experiments by training our GANs based on multilayer perceptrons on MNIST dataset (more information about the experimental analysis can be found in our paper). We tested Redux-Lipizzaner in 4x4 and 5x5 grids and we used a Single GAN as a baseline. We evaluate these approaches by training the sub-populations by using 25%, 50%, 75%, and 100% of the training dataset. This means that each cell of the grid uses that percentage of the training dataset. We run 30 independent runs of each experiment. All the methods are trained by consuming the same computational time.

Figure 2. Mean FID score by training GANs with different portions of the MNIST training dataset. The mean value of FID of the Single GAN trained with 25% is higher than 500.

Single GAN training on 25% of the data, the mean FID score is very high: 574.6. When the data subset is doubled to 50%, the mean FID score drops to 71.2. Then the mean FID is reduced to 39.8 when using the whole training dataset. When Redux-Lipizzaner is used, the FID scores are really better (lower) than when Single GAN is applied (given the same training budget). In the case of a 4×4 grid, the mean FID score is 47.0 when training with just 25% of the data. In the case of a 5x5 grid, the score is 39.9. This shows the benefit of increasing the size of the populations, which fosters diversity. For both grid sizes, the FID score is reduced as we increase the size of the training dataset (50% and 75%).

As can be seen, Redux-Lipizzaner benefits from the communication which indirectly leads to a mixing of the data subsamples (that are drawn independently and with replacement) that effectively improves the coverage of the data.

Thus, there is the potential benefit of the implicit communication that comes from the training on overlapping neighborhoods and updating the cell with the best generator after a training epoch. This training method makes use of information exchange between neighborhoods to generate high performing generators.

We can conclude saying that the spatially distributed grids allow training with less of the dataset because of signal propagation leading to the exchange of information and improved performance when training data is reduced compared to ordinary GAN training.

This post summarizes the research published in the following book chapter:
J. Toutouh, E. Hemberg, U. O’Reilly. Data Dieting in GAN Training. H. Iba, N. Noman (Eds.), Deep Neural Evolution — Deep Learning with Evolutionary Computation, pages 19, Springer, 2020, Springer. https://arxiv.org/abs/2004.04642

--

--

Jamal Toutouh

MIT postdoc researcher. Inpassion for deep learning, GANs and evolutionary computing; and how to apply them to address real-world problems and Climate Change