AI image video generation through GAN-variant algorithms

AI generated images and videos will let humans harder and harder to tell whether it’s real or fake. The images are so real and natural. The technology behind it uses Generative Adversarial Networks (GANs) derivatives. If the reader is not interested in its development history, please skip it and go to see the demo use cases.

If you are following AI technology, Deep Learning (or Deep Neural Network(DNN)) and Reinforcement Learning are the two major breakthroughs in modern history. The derivatives from Deep Neural Network for the ImageNet competition outperform human beings in 2015, 2016. After that, humans would never be able to beat machines in image classification accuracy. Since then, the derivatives of Deep Neural Network keep generating huge progress in research and real AI applications. One example is Convolutional Neural Network (CNN), which is used heavily in face recognition. One China company, SenseTime (商湯科技) raised landmark Series C Financing with $600 million USD, valuing the Company at Over $4.5 Billion USD, is a very good example.

Under the derivatives of DNN, there is a famous one, Generative Adversarial Network (GAN). It was brought out by Ian Goodfellow in 2014. Facebook AI guru, Yann LeCun, told to Quora media,

This (GAN), and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.

The number of research paper coming out of the variations of GANs has skyrocketed.

Fig 1. The number of research paper from the variations of GANs. Source: Gan-zoo

So, what is GAN? A quote from the blogger, Adit Deshpande’s explanation,

The basic idea of these networks is that you have 2 models, a generative model and a discriminative model. The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. The task of the generator is to create natural looking images that are similar to the original data distribution. This can be thought of as a zero-sum or minimax two player game. The analogy used in the paper is that the generative model is like “a team of counterfeiters, trying to produce and use fake currency” while the discriminative model is like “the police, trying to detect the counterfeit currency”. The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train through alternating optimization, both methods are improved until a point where the “counterfeits are indistinguishable from the genuine articles”.

Use case #1: Removing and inpainting a person in any part of an image

Sometimes, when we take a photo, there might be someone in the background image and we want to remove it. Although it can be done in 2012 in Adobe PhotoShop, the algorithm for doing it keep improving. Now, we can use one variation of GAN to do it.

Fig 2. Thank Arunabh Sharma to share his inpainting result by editing his own photo. Source: Inpainting Arunabh Sharma

Use case #2: Food image inpainting

In 2018, PIXNET in Taiwan hosted a hackathon about food image AI generation. An irregular shape was taken out from a food image. The teams are asked to implement AI image generation to fill in the hole. The judges are the audiences. They decide which AI-generated food images are more natural and humans are willing to eat. The winner went to which team earned most votes. The writer did the implementation by using Partial Convolutional Neural Network via Keras (PConv-Keras), and purposely pick strange ones here to show that not every AI is smart enough to inpaint an image. Of course, through selection of training data, and more iterations of training, the output image will become better and better.

Given an image like the following,

Fig 3. Image to be filled. Source: here

The generated output could be the followings

Fig 4. Food images generated by AI. Source: here

Which one do you think more real?

Use case #3: Simple line drawing becomes colorful image in real-time

Using a simple line drawing as its reference point, Pix2Pix converts it into a colorful image based on its understanding of shapes, human drawings and the real world.

Fig 5. Pix2Pix in real-time. Source: here

Use case #4: Motion transfer in video (Let people who don’t know how to dance do the dancing in video)

A research team at UC Berkeley published their work on Youtube. Given a source video of a person dancing, the algorithm can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves.

The above use cases are what I found interesting. Of course, there are more other use cases. One example is style-transfer. Given two images, one is by artist, such as Vincent van Gogh, and the other is common image. It can easily transfer the Van Gosh’s style to the other image. Another example is high-resolution image generation. Through a series of generation and discrimination, a model can be trained to make an image become a higher resolution image.

Reference

PConv-Keras Github:https://github.com/chuangtc/PConv-Keras

Pix2Pix Github : https://github.com/keijiro/Pix2Pix


Originally published at ai4quantblog.com on September 2, 2018.