DATA STORIES | GENERATIVE ADVERSARIAL NETWORKS | KNIME ANALYTICS PLATFORM

How to create GANs with KNIME Analytics Platform

The race between the generator and discriminator for synthetic image data

Emilio Silvestri
Low Code for Data Science

--

Co-author: Ivan Prigarin

This is the first of a two-part article about GANs in KNIME. Part II “GAN Data Apps” has been published on December 22nd 2021.

The use of synthetic data is gaining popularity in many fields. It can be a great strategy in cases when data collection, for instance, is very difficult or expensive, or perhaps you are struggling with class imbalance in the dataset impeding the performance of your models, or you just want to add a bit of spice to your data. One of the techniques that can be used to generate artificial data is GANs — Generative Adversarial Networks. In today’s example we will show how to train a GAN that is able to generate (more or less) realistic face images, with the help of KNIME Analytics Platform. Figure 1 shows the workflow that we will examine in this blog post.

Figure 1. The final workflow contains all the steps necessary to implement a GAN in KNIME.

1. What is a GAN and what can it be used for?

GANs are a particular family of neural networks invented in 2014 by Ian Goodfellow et al. Their primary, but not the only, purpose is to produce synthetic data that closely mimics a given dataset of real data.

Unlike most other neural networks, GANs consist of two models, which makes them a bit more complex:

  • A generator model that, given input vectors of random noise, aims to produce artificial samples that are as similar to the real samples as possible.
  • A discriminator model that tries to distinguish the generated samples from the ones in the original dataset.

During training, the two models play a zero-sum game. The generator produces a batch of fake sample data that is fed to the discriminator together with the real data. The discriminator tries to identify the probability of the input image being real. The weights of the models are updated according to how successful they were in their tasks. The training process converges when the discriminator is no longer able to distinguish the synthetic samples from the real ones, having a fifty percent accuracy in its prediction.

Figure 2. The GAN structure.

Figure 2 shows how GANs work. In this example the training dataset contains real face images that the generator will try to reproduce in order to fool the discriminator. If this is achieved, we are going to have a fully working face-generator model that can later be used to produce your next profile picture.

Due to their complexity, successfully training GANs is usually a non-trivial task. They might suffer from model collapse, diminished gradient, never reach convergence or even a suboptimal result. Moreover, the training dataset must be very large and heterogeneous. Together with the architecture- and parameter-related decisions, this makes achieving tangible results with GANs rather difficult. On the bright side, though, once those results are indeed achieved, it makes the whole process worth it.

For instance, in one of our experiments, we encountered model collapse. We have been asked to generate more images to populate a dataset containing regular and defected mechanical pieces. The process of enlarging an existing dataset is called data augmentation, and it is a perfect application for GANs. However, the images in the training dataset were too similar to each other, which caused the generator to produce the same image over and over regardless of the random input vector it was given (Figure 3). This is probably due to the fact that our model was overall too simple to catch the tiny differences among the pictures. Thus, we had to turn to more complex approaches.

Figure 3. An example of model collapse, where the generator is only able to produce a single image.

2. Before we start: prerequisites

In the deep learning community, it has become the standard to define models using Python libraries such as Keras and TensorFlow. KNIME Analytics Platform can be enhanced by the dedicated KNIME Deep Learning — Keras Integration, which provides native nodes for building and training your models.

However, these nodes are designed to work with a single model at a time. Since GANs involve a concurrent training of two models, they require us to think outside the box. Thanks to the Python integration in KNIME Analytics Platform, we were able to add the required functionality to our workflow using Python code inside the DL Python Network Learner node.

Having said that, we have abstracted away the Python source code into a component, which is a no-code way of training new GAN models using your own data, right within KNIME Analytics Platform.

Whether you are implementing a GAN from scratch, or using our example workflow, there are a few KNIME extensions that are required to make everything work:

For more details on how to correctly set up the Deep Learning extensions, please refer to the guide on Setting up Python for KNIME Deep Learning.

Note 1. Upon the first run of our example workflow, a new Conda environment will be installed with all the packages you need to train a GAN model either using CPU or GPU. However, please keep in mind that training GANs is a really resource-intensive process. It is highly recommended that your machine have a GPU compatible with TensorFlow. More details can be found in the KNIME Deep Learning Integration Installation Guide.

Note 2. Since this experiment involves processing a large number of images, it is recommended to change the serialization library in order to avoid buffer overflow. In Preferences → KNIME → Python Deep Learning scroll to the bottom and select CSV (Experimental) as serialization library for the Python Deep Learning extension.

3. A look at the GAN architecture

As mentioned previously, GANs are made up of two primary elements, the generator and the discriminator, which are fully fledged neural networks that need to be separately defined (see Figure 2 for an overview of the GAN structure).

In the context of working with images, the generator is a network that takes in a vector of random values, and produces an output image of the same dimension as the training samples. The discriminator, on the other hand, takes in an image (either from the training dataset, or from the generator), and outputs the probability of it being real. During preprocessing, the training images need to be resized in order to fit the input dimensions of the discriminator.

There are many ways of defining and tuning a GAN, aimed at different use-cases, hardware, and data, and, unsurprisingly, there is no simple solution to the problem of building the best possible model. Instead, we can rely on some well known best practices when implementing our network, with the overall architecture adapted from this article by Jason Brownlee.

Let’s have a closer look at our generator and discriminator.

Figure 4. Generator architecture.

3.1 Generator

The generator has the following layers

  • Input Layer (1, 1, 100). The input is represented by 100 random points, and is called a latent space vector. Since this vector results in the output of the model, which is an image, the input vector and the latent space itself can be seen as alternative representations of the output image. The generator, then, is a kind of translator from latent space to the image-dimension space.
  • Dense Layer.
  • LeakyReLu activation function.
  • Conv2DTranspose. This layer performs the opposite operation of a convolution, upsampling the data. Stacking several transpose convolutional layers one after the other we are able to upsample the random noise of the latent space to the image dimensions, which, in our case, is 128x128x3.
  • Conv2D and Tanh activation function as the output layer. This choice is relevant, since our goal is to produce realistic images that fool the discriminator. As you will see in the preprocessing section that follows, we normalize the training images into the [-1, 1] range. With the help of the Tanh activation function, the images produced by the generator will be in the same range. This ensures comparability between the outputs of the generator and the real images.
Figure 5. Discriminator architecture.

3.2 Discriminator

The discriminator model makes use of the following layers:

  • Input Layer (128,128,3) that matches the dimensions of the images, both real and fake.
  • Conv2D Layers and LeakyReLu activation function. The goal here is to propagate the images forward while downsampling their dimensions. Therefore a sequence of convolutions set with a higher stride will do the job.
  • Flatten Layer.
  • Dropout Layer that sets some values to 0 to prevent overfitting.
  • Dense Layer leading to a single output, representing the predicted probability of the image being real.

3.3 Implementation in the KNIME Analytics Platform

Figure 6. Generator and discriminator models defined with DL Python Network Creator node.

In the example workflow provided for this article, we defined the generator and discriminator using Python code inside the DL Python Network Creator nodes (see Figure 6). In addition to the pair of models for images of size 128x128, we also defined them for images of size 64x64. The smaller architecture will (read and) produce smaller images, but will be significantly faster to train due to the lower number of weights. This can be particularly useful when performing initial experiments with new datasets.

It is also possible to define new model pairs by slightly modifying the code in the DL Python Network Creator nodes.

4. One more step: Image Preprocessing

Figure 7. Preprocessing steps within a Parallel Chunk Loop for concurrent execution.

The images we are using for this example are available in this GitHub repository. The dataset contains 70K images of faces, already aligned and cropped. We would like to perform a number of preprocessing steps on the data in order to get them ready to be fed to our model. Specifically, we need to read the images into memory, resize them to the appropriate dimensions, and normalize their pixel values to the [-1, 1] scale.

This can be carried out by the nodes of the KNIME Image Processing Community Extension. After reading the files into KNIME using the Image Reader (Table) node, the Image Resizer node can be configured to scale them into the desired dimension — to be defined according to the discriminator input and generator output configuration. Lastly, image values are scaled to the [-1,1] range by the Image Normalizer node. These steps assure that all the images have the same size, compatible with the dimensions specified in the definition of the two models, and are scaled to a common range of values.

We can make this process more efficient by utilizing the Parallel Chunk loop nodes. In Figure 7, the part of the workflow contained between the Parallel Chunk Start and the Parallel Chunk End nodes is executed concurrently several times, each instance handling a subset of the data.

5. Of training and waiting…

As mentioned before, the generator and discriminator play a zero-sum game during training, where the former tries to fool the latter by constantly improving the quality of the generated images. At the same time, the discriminator competes by trying to become better at spotting the generated images. Since the performance of one network depends on the output of the other, training is an iterative process. Namely, for every epoch, the generator’s weights have to be frozen while the discriminator’s weights are updated, and vice versa (see Figure 8).

If, for instance, we didn’t freeze the weights of the discriminator, then the generator would have to rely on constantly shifting feedback, preventing it from learning how to improve its performance.

Figure 8. Steps of the GAN training process.

In our case, the implementation involves defining the generator, the discriminator, and the GAN, which is the combination of the two (g_model, d_model, and model respectively in Figure 9).

Figure 9. The generator and discriminator are combined in a single GAN model. Notice that the discriminator weights are frozen (i.e. not trainable).

The training process is once again defined in Python. Each training epoch involves three steps (Figure 10):

  1. Update of the discriminator weights using half a batch of real images.
  2. Update of the discriminator weights using half a batch of fake images.
  3. Update of the generator weights according to the performance of the discriminator classifying the fake images. Notice that for this particular step we use the combined GAN model defined above, where the discriminator weights are frozen.

Additionally, in step 3, we would like to assess the ability of the generator to fool the discriminator. Therefore, we assign the “real” label to the images produced by the generator before feeding them to the discriminator.

Figure 10. Implementation of the training process.

Everything involved in the training process is embedded in the GAN Learner component. In its configuration window (Figure 11), you can select the column containing the preprocessed images, together with other training options such as number of epochs and batch size. The last option controls whether the training will be carried out using CPU or GPU. On the first run, this will install and use a new Conda environment containing all the needed packages for the selected option.

It is possible to save intermediate results during training, by selecting the corresponding check box and indicating a custom folder. If selected, the component will save the generator model and a sample of randomly generated images every 10 completed epochs. This comes in handy to monitor the results during long training sessions (an example of this can be seen in Figure 12).

Figure 11. Configuration window of the GAN Learner component.
Figure 12. Intermediate results saved during a 300 epochs training session.

6. Results

While training GANs is a very time-consuming process, your patience will be rewarded. At the end of the training, the GAN Learner component will output the generator model, which can be used to produce artificial images right away.

Figure 13. The section of the workflow that uses the trained generator model to produce new images.

Recall that the generator takes as input a vector of random values and “transforms” it into an image. Thus, by feeding our generator a set of random inputs, we are able to evaluate its performance by viewing the resulting images (see Figure 13).

The trained generator model is fed into the Keras Network Executor node, where it is applied to the latent space vectors produced by the Create Latent Space Vectors component. The output of the model is then postprocessed and rendered to be displayed in a Tile View (Figure 14).

Figure 14. The GAN-generated faces.

Even if not perfect, the generated faces look convincingly real. Notably, the generator model that produced those images learned to do that without ever seeing a single real image. Instead, this was achieved only by observing the performance of the discriminator.

While this example used a dataset of human faces, there are countless other options. You can download the final workflow from the KNIME Hub, and train your own GAN on other images. Animals? Cars? What will you generate?

In the making of this blog post, we experimented with a variety of different datasets, collecting several generator models. We will showcase those in a future article, alongside some more advanced visualization techniques. Stay tuned for the GAN Data App!..😎

References

--

--