Introducing Generative Adversarial Network (GAN)

QARA
QARA
Published in
7 min readDec 10, 2018

One unique benefit to working at a deep learning fintech startup is that you get to explore the vast richness of artificial intelligence and turn your textbook knowledge into practical skills.

At QARA, newcomers and seasoned professionals alike have the chance to learn about the different types of neural networks available today. One special type is called Generative Adversarial Network (GAN), which is an unsupervised machine learning that allows two neural networks to pit one against the other. But to understand GAN in more detail, we need to first understand the basics of neural networks and how they are formed.

Neural Network

In simple terms, a neural network — or sometimes referred to as the Artificial Neural Network (ANN), is a collection of information processing signals that contains the ability to digest complicated data. It is essentially a brain-inspired model that replicates how humans think and learn. A single unit of computation in a neural network is often called a node. Each node carries a signal that transmits to an interconnected group of nodes called edges and proceeds to form a series of layers that process information from the outside world. These layers then connect with one another to form the majority of what makes a neural network functional.

One other thing to note is that some nodes or edges are weighted. The higher the weight, the easier it is for certain nodes to pass the signals through the layers. In other words, the weight decreases or increases the signal’s strength at a connection. ANN absorbs information through the different layers like a mathematical formula. As data goes through each connection, the network is able to learn incrementally until it knows how to respond to the processed data. Much like how a human brain operates, ANN has different levels of insights as information gets passed on to the next.

Interestingly enough, ever since its first inception in 1943 by Warren McCulloch and Walter Pitts — both who worked in computational neuroscience, several different models for neural networks have been developed. More notable ones are called Variational Autoencoders, Convolutional Neural Networks, and Recurrent Neural Networks.

But among the many types of neural networks, Generative Adversarial Network (GAN) has been at the forefront of some of the most leading innovations today.

Generative Adversarial Network (GAN) — What Is It?

ANN is a network that generates a model for solving whatever problems the model has been designed to solve. So the question is, is “Generative-AN” a network that generates a model that can create something? Yep, you guessed it. As a creator of things, GAN can generate numerous fake things that appear as authentic and true to their original contents.

Just as any human artists create their works from scratch, the model generated by GAN is capable of creating numerous works such as images, music, or novels. “Adversarial”, on the other hand, means that GAN is made up of two separate parts — generative and discriminative. For GAN to produce good artists, both components need to work together in a competitive manner.

GAN was first introduced in 2014 by a deep learning guru Ian Goodfellow, a scientist at Google who wrote the seminal book “Deep Learning (MIT Press).” In his proposal about GAN, he compared the relationship between generative and discriminative with that of a cop and a counterfeiter.

But before we dive into this analogy, we first need to understand why GAN is such an important concept in today’s AI-driven world. Normally for an image recognition sequence, a neural network will have to go through thousands of pictures to determine the image’s identity. For example, a neural network can identify an image of cats by going through pictures that are labeled “cats” or “not cats.” This process requires people to manually label each picture, which is time-consuming and tedious.

Here’s where GAN comes in. Instead of training a single neural network to recognize images, GAN reduces the amount of data needed to train deep learning algorithms by having two competing networks. For example, a generator network will create images of fake objects to make them look real while a discriminator network will determine their validity. In the case of a cop and counterfeiter, a counterfeiter will create fake money that looks similar to the original while a police will determine how far or close it looks from the original. The idea is that as time passes by, both the cop and counterfeiter will develop capabilities to create fake images that are almost indistinguishable from the originals. So instead of manually labeling each image for a neural network to identify, GAN can actually do the work for you.

This picture represents real people but their faces are formulated by GAN.

If you want to invest in A.I. related stocks, please click here.

Different Models of GAN

Although GAN was first released in 2014, it wasn’t until 2 years after in 2016 that many scholars and researchers have taken interest in developing this idea. Among the many models that have gained recent attention, several of them have implemented a technique called Image Translation. Image Translation allows original images to transform to the user’s desired outcome. For example, you can change black and white images to colored, transform human pictures into cartoon characters, and turn young people’s photos into old ones.

Here are a few GAN models that utilize Image Translation:

1. Pix2Pix

Pix2Pix literally means “pixel-to-pixel,” and it features a wide variety of systems that analyze and interpret original contents. By using artificial intelligence, machine learning, and something called Conditional Adversarial Networks (CAN), Pix2Pix can instantly convert any drawings and illustrations into real life paintings. When you open a Pix2Pix model, you’ll have two boxes: input and output. In the input box, you can submit an image or create your own quick drawing and then click convert to allow Pix2Pix to form a painting of its own.

The picture below is a model called edge2edge, which is famous for providing images of real-life cats with only just the outline drawing.

2. CycleGAN

As great of a model Pix2Pix is, without an image-to-image or one-to-one dataset, Pix2Pix is almost impossible to use. But CycleGAN is a model that can fix this problem. One distinct advantage is that CycleGAN is able to transfer styles to images without having a one-to-one mapping between images.

Through CycleGAN, you can integrate image translation in several different ways. You can change Monet’s painting into photographs, change zebras into horses, or turn summer pictures into winter pictures. And this change goes both ways.

3. StarGAN

Although the Pix2Pix and CycleGAN have found remarkable success in translating images, they plateau when handling more than two domains. With StarGAN, however, you can operate image-to-image translations for numerous domains using only one model. Founded by South Korean researchers, StarGAN uses several conditions to create new outputs each time. For example, a single generator N can learn to translate an input image x into an output image y, given the condition equals to C. The formula will go something like this: N(x,c) = y.

This image shows that StarGAN can use one person’s photograph to convert various domains such as hair color, sex, age, skin, facial expressions. This is done all at the same time.

Virtual Makeup

QARA has used GAN in various different projects in the past. One of our AI Scientist Interns, Hogun Ki, worked on a project that implemented virtual makeup to celebrity pictures. He chose Pix2Pix model to translate the images. First, he reduced the image size to 128x128 and added zero-padding to any pictures that didn’t have proper scaling.

Not only was he able to add virtual makeup to celebrities who didn’t have any makeup on, but he was also able to erase the makeup to celebrities who did have them on! Through this, he was able to learn that the model could go both ways.

1. Bare Face — Makeup

2. Makeup — Bare Face

The picture on the left is the original photo and the picture on the right is the image created through GAN. Looks pretty good right?

Hogun was able to apply his technical knowledge about GAN and apply it to his own virtual makeup project. But the fact is, the GAN model isn’t just limited to creating images. It can be applied to Natural Language Processing and other deep learning models as well.

The field of artificial intelligence, especially in deep learning, is fast-growing. Scientists and researchers alike who specialize in this particular field are actively searching for opportunities to apply their skills more practically. As QARA is looking to globalize its services, we too welcome any chances to grow in this field.

You May Also Like

--

--

QARA
QARA
Editor for

On a mission to democratize financial services with our deep learning technology.