Deep learning for Hackers with MXNet (3): Instant neural art style transfer

This blog is originally posted in Chinese from my zhihu category.

So long from last blog, and thanks for coming back. Here I want to present a MXNet example of instant neural art style transfer, from which you can build your own Prisma app.

Do you know MXNet now can be installed via pip?

pip search mxnet 
mxnet-cu75 (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses CUDA-7.5.
mxnet (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses openblas.
mxnet-cu80 (0.9.3a3) - MXNet is an ultra-scalable deep learning framework. This version uses CUDA-8.0.

Let’s go

After installing MXNet, please do

git clone https://github.com/zhaw/neural_style

which includes three different implementations of fast neural art style transfer. Big thanks to the author Zhao Wei. In this blog, I am going to talk about Perceptual Losses by Justin Johnson et al described in this paper. After git clone, please go to neural_style/perceptual/ and execute the following script:

import make_image 
maker = make_image.Maker('models/s4', (512, 512))
maker.generate('output.jpg', 'niba.jpg')

where output.jpg is the output and niba.jpg is picture of the cutest deep learning cat Niba. Within a blink, we can see the output like this:

Beside this art style, multiple other pretrained neural art models are mentioned in README page under neural_style/perceptual/, please download them via the link mentioned in the page. These pretrained models should produce the art work and combine with ImageTrick:

montage output*.jpg -geometry +7+7+7 merge.jpg

Please note: some machines may encounter the following error

terminate called after throwing an instance of 'dmlc::Error' what(): [21:25:23] src/engine/./threaded_engine.h:306: [21:25:23] src/operator/./convolution-inl.h:299: Check failed: (param_.workspace) >= (required_size)

The reason behind this is from the workspace size of the convolution layers, where the default workspace might be too small for some large images. Please edit symbol.py by adding workspace=4092 to each mx.symbol.Convolution function.

Hope you have some fun with your own Prisma app 🙂

Theory

Neural art transfer has been a hot topic in deep learning, and it starts from this paper A Neural Algorithm of Artistic Style. As we have discussed in the last blog, this idea leverages the power of convolutional network where the high level features can describe so called style of an image, if apply this high level feature to a new image, one can transfer the art style and generate new art work. In the original paper, gram matrix is used for this magic. To understand gram matrix magic, one can take look at my friend’s paper Demystifying Neural Style Transfer for further understanding. There are many blogs and papers trying to understand why neural art transfer works, and this paper is probably the only correct one.

Back to the original neural art transfer: the original version calculates the per-pixel loss from the content image to the style image, and introduces a very large gram matrix, meanwhile, it has to run a logistic regression for tuning the weight of each layer. This method needs much computing time due to couple of heavy load from per-pixel loss, gram matrix plus the LR. In the market, there are several faster implementation, where Perceptual Losses method is one of the fastest ones. Perceptual Losses introduces pretrained loss networks from ImageNet, and re-uses the content loss and style loss to calculate perceptual loss, however, it doesn’t update the loss network, which saves much computing time. It works like this: when give the input image (e.g. Niba) to the transform network, it calculates the loss from the pretrained loss network, and gets back to transform network to minimize the loss, so transform network can learn the loss network style from minimizing the loss.

Perceptual loss network needs a set of pretrained network where each network for a style. One can follow train.py under the same repo for creating new styles.

Appendix

Why I paused updating this blog for a long time and resume?

Because I was carefully thinking about teaching MXNet and deep learning in a different way, much different from many other blogs or medium posts where each tutorial starts with theory or math or whatever fundamental knowledge, needs at least 30 minutes reading time, professionals don’t like the repeated fundamental knowledge part since they already know it, but new readers can’t understand what to do.

I believe the only way readers can remember the knowledge is by JUST DO IT!. From last year, I opened my category on zhihu.com and started publishing two minutes demo of deep learning in Chinese. It turned out very welcomed: my 2000+ followers had much fun trying these demos, they really learned after doing it and reading the theory part. If miss some math knowledge, I showed them where to learn. So, I am thinking that, why don’t I translate it back to English, and share with more readers. I will keep posting more blogs like this, hope you like them.

And, as always, have you clicked the star and fork buttons on MXNet repo https://github.com/dmlc/mxnet ?


Originally published at no2147483647.wordpress.com on March 16, 2017.

A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.