CycleGAN: The Artificial in Artificial Intelligence

Published in

SFU Professional Computer Science

8 min readFeb 3, 2020

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

Authors: Ravi Kiran Dubey, Ruchita Rozario, Slavvy Coelho, Ziyue Xia

We’ve all tried the FaceApp Challenge which gave us the pretty vogue face we always wanted and also the wrinkles we were not ready for. We’ve definitely thanked our gods for the two seconds we saved every time our iPhone 11 recognized our face while unlocking. This is only the crust of the advancements AI has made in the field of Digital Image Processing. The real MVP here is a technology that can think like humans and make decisions by itself without explicitly programming. We’re talking about Neural networks.

Feed the neural network a picture of toppings on bread and it will very easily classify if it’s a pizza or a burger. Ask the neural network to generate a completely new image of a pizza and it will be as lost as Hogan’s goat.

Thus, arose a need for having something that can generate new images in addition to classifying the existing ones. Enter ‘Generative Adversarial Networks’ a.k.a. GANs! GANs are a combination of two neural networks contending against each other to generate new instances of data that can pass off as real data.

Why they are called generative is pretty obvious as the neural network creates unseen data. Now consider teaching digits (0–9) to a child. Both human and machine show the child multiple images of 0–9 at random. Once you see that the child has started guessing a particular number right, you tend to reduce the frequency of showing that number and instead focus on showing the images that he hasn’t been able to classify correctly. Basically, focus on the weakness. This is exactly what happens in adversarial learning. Two networks are pitted against each other to bring the best out in each other.

Introduction

GANs are a combination of two neural networks- A discriminator and a generator. Generator produces new data instances with initial random noise fed and eventually gets better at generating images from desired domain. On the other hand, a discriminator has the job of authenticating the images fed to it, for instance, output if the image that’s fed to it is a real cat image or not (GAN modelling is weirdly obsessed with generating new cat images for unknown causes :P). If the discriminator is able to tell if it’s fake, the generator gets a negative reward. The discriminator is alternately fed with actual authentic ground truth dataset along with the synthetic images generated by the generator. This helps the discriminator to learn features that help it rightly classify a fake image from a real one. The goal of a generator on the other hand is to generate images that pass by the discriminator as a real image. The neural networks are competing for the same number — discriminator’s error rate. The discriminator wants the error to be low and the generator wants it to be high.

This approach of making two competing neural networks learn is analogous to a forger and an expert investigator situation. Initially the forger is a novice in the domain and so is the investigator. As the learning stages proceed, the investigator gets better at judging if the image is forged or not. The forger is in-turn forced to get better at forging. As he gets better at making fake samples, the investigator needs to up his game and get better at catching the fake samples. This trade-off between the discriminator and generator is theoretically supposed to stop when the generator gets so good at generating new images that the discriminator outputs both the results with equal probability of 50% as it can’t judge if the image is real or not.

Architecture of GAN

The generator is fed with random noise vector initially against which it generates an image.
This generated image is fed into the discriminator alternated with a stream of images taken from the actual, ground-truth dataset.
The discriminator now uses the neural network to return probabilities between 0 and 1 representing whether the input image is authentic or a generated image by generator.
The double feedback rule works in the following manner: The discriminator is in a feedback loop with the ground truth images. The generator is in a feedback loop with the discriminator.

Why CycleGAN?

A major loophole in double feedback system is the problem of mode collapse. Here, after a certain number of iterations, the generator stops generating interesting and new outputs as it has already discovered certain samples/modes that have passed the discriminators test. The discriminator can technically not complain about it as the generator is generating a valid outcome. An example of this can be a GAN that generates digits i.e. mode 0 to mode 9.

We can observe that after 100k steps, the model returns only mode 6 as an output. This requires us to come up with an architecture that has a way of validating the output generated by model.

Secondly, GANs cannot function with unpaired collection of data but paired data is hard to find (Unless we hire several painters to paint thousands of images provided to them) .

Architecture of CycleGan

CycleGAN contains a network of two GANs, each has a generator and a discriminator model. The generator takes an input image and produces an output. The discriminator then compares the output with the input and predicts whether both images belong to the same class or not.

Let us consider two classes of inputs: Input A -Set of images of zebras, Input B - Set of images of horses

Gan A consists of:

Generator A : It takes the input image of zebras and generates an image of horses.
Discriminator A : It compares the output image from Generator A with the set of images in Input A and predicts whether the images are similar and makes a decision.

Similarly, Gan B consists of:

Generator B : It takes the output from Generator A as input, i.e, image of horses and generates an image of zebras.
Discriminator B : It compares the output image from Generator B with the set of images in Input B and predicts whether the images are similar and makes a decision.

Until now, we have seen that the above model generates an image, but the generated image is not a translation of the input image. This is where cycle consistency loss comes into picture. Cycle consistency loss calculates the loss difference between the input image to Gan A and the output image from Gan B using the L1 norm or summed absolute difference in pixel values.

In simple terms, Generator A will take an image of zebras as input and produces an image of horses as output, which acts as an input for Generator B and generates an image of zebras as output. Now, the cycle consistency loss will find the difference between the input image to Gan A and the output image from Gan B and the generator models get updated to reduce the difference between the images.

Applications of CycleGAN

Object transformation: Converting images with objects belonging to one domain to some other domain is a very classic application of CycleGAN. They include wide range of examples like apples to oranges, horses to zebras, bears to pandas and so on.

Converting photographs from paintings: Generating photographs from painting and vice versa has also lately become one of the most interesting and used applications of CycleGAN.

Picture quality enhancement: CycleGAN aids photo enhancement by improving the depth of field, changing black and white to coloured pictures, etc.

Image style alteration: Converting images from one style to another such as Monet to Van Gogh.

Drawbacks

Object transformation happens accurately with the help of cycle GAN only if the objects that are to be transformed have similar geometric constructions. For eg: Oranges and apples.

The above image shows a poor transformation of a cat to dog as they have different structures.

The image quality of the generated image is low as of now and efforts are made to smoothen the transformations.

CycleGAN in the future

CycleGAN is still a new technology which has limitations while also having a huge potential. Now, with CycleGAN we are able to generate new unseen images based on an object input that has a similar geometric structure. Projects have been made to visualize the influence of disasters on houses to raise people’s awareness towards climate change, or making a face swapping video that generates a completely different face using CycleGAN. Note that CycleGAN doesn’t need paired images in the dataset to train, which means that we can now not only transfer a real object into an artificial work, but also make an artificial work look more realistic. It would be quite exciting to see CycleGAN being used in different industries- maybe generating a street view of Miami covered by snow for a movie with lower budget, or making the characters in the next live-action remake movie look almost the same in the classic cartoon movie!