Creating new “art” works by integrating Creative Adversarial Network (CAN) and Neural Style Transfer Learning
Today I would like to talk about an algorithm that can create new “art” works by combining a photograph from humans and an artistic painting created by AI. This is based on two deep learning algorithms Creative Adversarial Network (CAN) and Neural Style Transfer Learning (NSTL). First, lets us take a look at Creative Adversarial Network.
Creative Adversarial Network (CAN)
CAN is based on Generative Adversarial Network (GAN), one of the most influential algorithms in deep learning, invented by Ian Goodfellow and his colleagues. GAN is typically comprised of two neural networks — models, Generator(G) and Discriminator(D). These two models are trained simultaneously. The model G is trained to capture the data distribution, while the model D is trained to estimate the probability that a sample came from the training data rather than G (a real image or a fake image). The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In other words, model G is trained to produce fake samples from data distribution and model D is trained to determine whether data are real or fake. The training procedure is similar to a two-player min-max game with the following objective function
CAN is an art generating agent, which is a functioning model using a variant of GAN to make it creative. CAN modifies GAN objective function in a way that Discriminator (D) sends back two signals back to Generator (G). The first one is like GAN, Discriminator determining whether the input image is real or fake (real/fake loss). The second one is that Discriminator (D) classifies the art style on input image among 147 art styles with probability. This probability determined by Discriminator is called style class posterior. CAN aims to minimize cross-entropy between style class posterior and uniform target distribution, where each art style as the same probability. This cross-entropy is style ambiguity loss. By minimizing style ambiguity loss and real/fake loss, CAN allows the art generating agent (Generator) to produce not only artistic images but also forces it to explore creative art space by deviating from existing art styles. Thus, CAN is proved to be creative while GAN is proved to be emulative.
where z is the noise vector sampled from Pz (uniformed or Gaussian distribution), x and c are a real image and corresponding style label from data distribution p data. Dr (.) is the function that tries to discriminate between real art images and generated images, while Dc (.) Is the function that tries to classify different style categories and estimates style class posteriors.
The training procedure of CAN is the same with GAN, where both Generator and Discriminator are trained simultaneously.
Neural Style Transfer Learning (NSTL)
NSTL is an artistic algorithm that extracts content representation from the content image and style representation from the style image then recombines these two together to generate an image that looks like the content image painted in the style image. It usually takes a noise image or content image as an input image.
Neural Style Transfer algorithm uses typical Deep Convolutional Neural Networks (CNN) trained on certain tasks, such as object detection, object classification, etc. This is basically what Neural Style Transfer does. I don’t want to extend this article for too long. If you want to know more about Neural Style Transfer, which is an excellent algorithm, you can read its original paper.
Finally, the Integration of two deep learning algorithms
First, a creative art image is generated by CAN. Then, it is fed into the Neural Style Transfer as the style image. Then, a photograph taken by a human is fed into the NSTL as the content image. The Neural Style Transfer Algorithm then takes these two images as input and trains a random noise image (input image) produced from Gaussian distribution to look like the content image painted in the style image (output image). Below you can see the block diagram of the proposed system.
Some Examples
This is an art painting generated by CAN of the proposed system. CAN generate 256x256 images which are then up-sampled using the super-resolution algorithm to 512x512.
By combining the above two images using the Neural Style Transfer Learning algorithm, following the final result image is formed.
As you can see above, the output image contains more human familiar structuralities than the image generated by CAN, thus making it more appealing to human subjects. What is more interesting is these structuralities can be controlled by humans using the content image. As in the output image above, since an image with big rocks near the beach is used, we can see vaguely (or clearly) big rocks near the beach in the output image. By using art paintings from CAN which is a creative art generating agent, we can say that the style image which will be fed into the Neural Style Transfer will be the fruit of a machine’s creativity. By combining this machine’s creativity and human creativity which can be represented as the content image, it is exciting, at least for me, to see what kinds of “art” works will the proposed system produce.
References
Creative Adversarial Network (CAN) Paper: https://arxiv.org/abs/1706.07068
Neural Style Transfer Paper: https://arxiv.org/abs/1508.06576
Super-resolution Algorithm Paper: https://arxiv.org/abs/1802.08797
Thanks a lot for reading! If you have any thoughts, please let me know in the comments. Since English is not my primary language, I am sorry for any grammatical errors.