A Dozen Times Artificial Intelligence Startled The World

The best uses of Generative Models and how they work.

Sumeet Agrawal
Jul 31, 2017 · 10 min read

Generative Adversarial Networks (GANs) are some of the most fascinating ways to “teach” computers to do human tasks.

We’ve always heard that competition can boost performance, but now GANs are taking “learning from Competition” to an industrial scale.

Generative Adversarial Networks are defined by AI entities (Neural Networks) that compete with each other to get better at their respective tasks.

Imagine a Malware bot competing against a Security bot, each relentlessly trying to execute its own objective (e.g. Invade VS Protect), and in this process, becoming better and better at its respective task.

First coined by Ian Goodfellow from the University of Montreal, GANs have recently shown us the power of “Unsupervised Learning” due to their widespread success.

So how do GANs work?

GANs Framework

A GAN has two competing neural network models. One of the network takes noise as input and generates samples (called the generator). The other network (the discriminator) receives samples from both the generator and the training data and is able to distinguish between the two sources.

These two networks play a continuous game. In it, the generator learns to produce new samples to fool the discriminator who in turn gets better at distinguishing generated data from real data.

These two networks are trained simultaneously, and after millions of rounds of “play” the generated samples become indistinguishable from the real data.

In simple terms the generator is like a forger trying to produce some counterfeit material, and the discriminator is like the police trying to detect the forged items.

Since the entire process is automated and limited only by available computation power, GANs can accomplish extraordinary feats.

Below are some of the coolest GAN applications in action.

1. A look into a machine’s imagination.

Google’s Deep Dream creates psychedelic images.

Researchers at Google Brain have developed a way to visually represent what their neural network, GoogleNet, “thinks” of as the essence of objects.

Using this method, the GAN produced some images that can be described as psychedelic in nature.

These dream-like hallucinogenic images are the byproduct of deliberately over-processed images through an image classifying entity. The system involved in creating them has been dubbed ‘Deep Dream’.

Example where the Deep Dream model exemplifies objects like towers, buildings and birds within the image.

To make Deep Dream work, you give Google Deep Dream an image, and it will start to look for ‘everything’ it was trained to recognize. The neural network might find some resemblance of a dog, house, jellyfish, etc… in an image of something totally unrelated. Like humans see objects in clouds. Google Deep Dream then amplifies these found objects.

For example, when you run the recognition network again, instead of saying ‘look, this is 40% certain that it’s a dog` it will modify the input image in such a way that the result will be ‘look, this is 60% certain it’s a dog’. This process is repeated till the input image is significantly transferred to look like a dog or any other object. As such, by gradually shifting from what it would classify as one image, to another, it creates otherworldly ‘in-between’ images.

Google’s Deep Dream have reversed the conventional wisdom of giving the same input and generating an output to changing the input every time and to get an optimal output.

Sources & More Info:

GitHub, Blog

2. Making a machine imitate Humans

Imitating Learning using GANs

A group of AI researchers wanted to come up with different approach to self-learn Artificial Agent as compared to the traditional reward based approach.

They gave real demonstration data as an input to the agent, which the agent then learnt and tried to mimic the same actions.

A bot is trying to mimic running by imitating how an actual person runs.

In this model, Jonathan Ho and Stefano Ermon presented a new approach for imitation learning. The standard reinforcement learning setting usually requires one to design a reward function that describes the desired behavior of the agent. However, in practice this can sometimes involve expensive trial-and-error processes to get the details right. Instead, in imitation learning the agent learns from example demonstrations (for example provided by teleoperation in robotics or human actions), eliminating the need to design a reward function.

Sources & More Info:

Blog, GitHub

3. Turning horses into zebras and winters into summers.

Image to Image generation

Generating image from an image is an interesting application of Generative Networks. In an experiment, researchers were able to change the animal within a video (or the season within a picture, or other analogous tasks).

The goal is to learn mapping between an input image and an output image using a training set of image pairs. However, in many situations, paired training data is not readily available. To overcome such an issue, two inverse mappings are used, and the outputs of each are made to be the same as the inputs of the other, thereby creating as full a relationship between the two images as possible with low amounts of data (Unsupervised Learning).

Some of the examples of this process are:

Animal transfiguration — Converting an Image of horse to an Image of Zebra by detecting the moving horse in the video and superimposing zebra stripes on top of it.

Sources & More Info:

Page, GitHub

4. Paintings drawn from sketches

Generating images from outlines

Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. This takes considerable skill, and an artist may have to train for years to do this on a consistent basis.

GANs able to generate realistic images from outlines.

Researchers created a model which when given an outline of an object, is able to identify the object as well as be able to generate a realistic image of that object.

However in this paper, the author proposes a method to learn the natural image manifold directly from data using a Generative Adversarial Neural Network. The model automatically adjusts the output keeping all edits as realistic as possible as well as all the manipulations are expressed in terms of constrained optimization and are applied in near-real time. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on users’ scribbles.

Sources & More Info:

Paper, GitHub, Page

5. Images Created from Only Text Description.

Generative Adversarial Text to Image Synthesis

Automatic synthesis of realistic images from text is interesting and useful. Recently, Deep convolutional generative adversarial networks (DCGANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors.

The model is fed with sample data consisting of text with their corresponding images, and when a description of any object is provided, the model will try to generate an image out of the description.

Model has the capability to generate plausible images of birds and flowers from detailed text descriptions

In this work, the text to image synthesis is obtained by first, learning the text features that captures the important visual details and secondly, use these features to synthesize a realistic image that can fool a human.

Sources & More Info:

Paper, GitHub

6. Making Computers learn through “Curiosity”

Curiosity driven Exploration of Deep Neural Network

In many real-world scenarios, external rewards to an Artificial Agent are extremely sparse or absent altogether. As such, a passive program doesn’t get to evolve and learn because of this intrinsic nature.

In such cases, “Curiosity” can serve as a built-in reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life — in these cases active learners do much better than passive ones.

In such a model, “Curiosity” is formulated as the error in an AI’s ability to predict the consequence of its own actions. The bot in such a world, can, of course, also learn through a reward system built by the programmer.

Think of this as being analogous to a small child. A small child without supervision has no knowledge of what will happen if he touches a hot stove, but once he has done so , he learns to not do so due to the pain and newfound knowledge of the causal relationship between touching the stove and feeling the said pain. Curiosity drove him to explore, and the reward system labeled actions as good or bad.

A game of retro snakes where the snake is learning to collect the green balls which increases its reward and learning to avoid red balls which decreases its reward through curiosity driven learning.

Curiosity driven learning is based on the following:

1) Sparse external reward, where curiosity allows for far fewer interactions with the environment to reach the goal

2) Exploration with no external reward, where curiosity pushes the agent to explore more efficiently

3) Generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.

This proposed approach can also be evaluated in the two game environments: ViZDoom and Super Mario Bros.

Sources & More Info:

GitHub, Paper, Website

7. AI Designing Games

Game UI design using GANs

The idea is that if we can generate convincing screenshots of imaginary video games, we could copy and paste bits of art from those screenshots and use it in our own.

The goal is to create a similar tile sheet for the game. To do that, program will gather a bunch of images from various games that already exists. After that, the program will generate new unique images that is made up of these bits of images. These images can then be used as a background for game creation!

Sources & More Info:


8. Predicting What Happens next in a video.

Generating Videos with Scene Dynamics

Understanding object motion and scene dynamics are core problems in computer vision. For both video recognition tasks (e.g., action classification) and video generation tasks (e.g., future prediction), a model of how scenes transform is needed. However, creating a model of dynamics is challenging because there are vast number of ways that objects and scenes can change.

Creating short future videos of Train station, Beach, Babies and Golf

This is achieved using a model that is able to separate foreground from the background, which enforces that the background is stationary and the network learns which objects are moving and which are not.

These videos are not real but they are “imagined” by a generative video model. While they are not photo-realistic, the motions are reasonable for the scene category they are trained on.

Sources & More Info:

Paper, GitHub, Page

9. Generating Realistic yet fake Faces

Neural Faces

“Neural Face” is an Artificial Intelligence which generates face images which are not real.

Neural Face uses DCGANs developed by Facebook AI Research team.

Unique Human Faces generated by GANs

The AI team, represented each Image by a vector Z that consists of 100 real numbers ranging from 0 to 1.

The generator then generates an image from the Z vector using a Gaussian Distribution by figuring out the distribution of human images. The generator learns to produce new face images to fool the discriminator who in turn gets better at distinguishing generated face images from real faces images.

Sources & More Info:

GitHub, Page

10. Changing Facial Expressions & Features in Photos:

Vector Arithmetic on Faces using GANs

In an experiment, researchers were able to generate images of people with various facial expressions just by providing a system with sample images. For example, it could change a non-smiling face to a smiling face, add an object over the face, or accentuate certain features.

Using Arithmetic manipulation we can convert a non smiling person’s image to a smiling image or can add glasses to an image without glasses

The basic approach is: for each column of images represented as a vector(X), average each X to generate mean vectors Y.

Arithmetic operations such as addition and subtraction are then performed on Y vectors to create a single image, vector Z (man with glasses - man without glasses + woman without glasses). The image vector Z is then fed into the generator to produce the result as shown on the right-hand side of the image (smiling man or woman with glasses).

Changing a left faced image to a right faced image.

We can also convincingly model attributes such as rotation, scale, and position. To do so, we start by taking image samples of faces looking left and faces looking right. We then average them to create a Turn Image Vector. Then, by adding interpolations along the axis to the image vector, we are able to apply this “transformation” to new faces.

Sources & More Info:

GitHub, Paper, Blog

Significance of GANs For the Future of AI and ML


These are still early days for GANs. The above examples, as impressive as they might be, still only scratch the surface of what is actually possible with this framework. For us engineers, it gives us a powerful way to train Neural Nets to accomplish ANY complex human task. GANs prove that creativity is no longer a trait exclusive to humans.

Additional Resources on GANs

If you want to learn more in depth about generative models and DCGAN, here are some more resources:

1. Generative Adversarial Networks in 50 lines of code by Dev Nag

2. Ian Goodfellow’s keynote on GANs (More of a technical video)

3. Siraj Raval’s video tutorial on GANs (short fun video)


If you would like to know more about my work with AI/ML, check out Archie.AI


ML & Tech Articles from the team behind Archie.AI