Implementing the BigGAN model architecture with Tensorflow

Sieun Park
Analytics Vidhya
Published in
3 min readMar 25, 2021

Refer to this post for the concepts and methods proposed by the paper “LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS”.

Limitations of Machine Specs

Apparently, the original BigGAN model was trained with an environment with enormous computation power and memory. Executing the 256x256 biggan in Colab will crash with any means, at least with a batch size larger than 4. Although it is not likely to be trainable in our local environment, we will review how to implement the techniques and model architecture proposed in the paper.

Captured from Reddit

Implementation

The complete code for this post is available here, although produces an OOM(Out of Memory) or ResourceExhaustedError when executed.

We first implement the custom layers used in the paper. Conditional batch normalization means the previously mean and variance set parameters of batch normalization are set to outputs of a neural network. In this case, it is conditioned based on the concatenated vector of latent slice z and class embedding. The layer learns a linear mapping to output two vectors with size channel number, each replacing the mean and variance of the original batch normalization.

We then implement the non-linear block or the self-attention block. This block generates two mappings: f, g and applies the softmax activation to f*g to generate an attention map, and applies this map to another mapping of the image, h. Finally, one convolution layer o is applied afterward and generates the final output. The s tensor in self-attention consumes very big memory because the matrix allocated for s is (bs, h*w, h*w) and therefore is a core reason for the ResourceExhaustedError.

We then define residual blocks based on the predefined layers above. Both residual blocks have skip connection implemented and consist of two convolutions. The previously defined ConditionalBatchNormalization is used in the generator to exert class and noise information.

Again, based on the defined residual block and custom layers, we define a 128x128 BigGAN generator model. We connect 5 residual blocks and feed a concatenated vector of the shared class embedding and a slice of the noise latent vector. The non-linear block is placed in the 64x64 resolution. 256x256 model is also implemented in the Colab implementation(only need to tweak hyperparameters a little bit). The resulting plot of the generator architecture is chaos because of multiple noise projections.

Finally, we define the 128x128 BigGAN discriminator model.

We also implement the hinge loss for training.

We plot some example images of the BigGAN.

We define the training loop based on this Keras tutorial. We override the compile and train_step function of tf.Keras.Model. I highly recommend checking it out.

We try training the model, although fail with ResourceExhaustedError or kernel crashes.

Result->

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-42-db7aa313c680> in <module>()
9 )
10
— -> 11 gan.fit(train_dataset.batch(128), epochs=epochs, callbacks=[GANMonitor(num_img=num_img, latent_dim=latent_dim, noise=random_latent_vectors, anno=fake_anno)])

In this post, we reviewed

  • How to implement conditional batch normalization and self-attention for generative networks in Tensorflow.
  • How to implement the BigGAN model architecture in Tensorflow.

The BigGAN-deep architecture also can be easily implemented by a small modification of the residual block of this code. I will try training this code in Google Cloud Platform when I get to receive quota permissions for 8 GPUs and share results.

--

--

Sieun Park
Analytics Vidhya

Who knew AI could be so hot? I am an 18 years old Korean student. I started working in ML at 16, but that's it. LinkedIn: https://bit.ly/2VTkth7 😀