Everyone can be an artist — deep learning for Neural Style Transfer and how to improve it.

Nikolai Kalischek
EcoVisionETH
Published in
5 min readJul 7, 2021
Result of our Neural Style Transfer method. Move the slider from right to left.

Have you ever imagined seeing the world through the eyes of Vincent van Gogh, Pablo Picasso or Wassily Kandinsky — a world full of strokes and brushes. What if I tell you that there is a small Dalí in every one of us? All you need is your most precious photograph, your favorite artwork and a bit of Neural Style Transfer — voilà, a new artist is born!

All jokes aside, in this post we will have a closer look at why deep learning has gained tremendous popularity in styling photorealistic images with paintings (nowadays called Neural Style Transfer (NST)) and how we can improve upon the main underlying concept.

What is Style Transfer and what is Neural Style Transfer?

Let’s begin with some basics definitions. In style transfer, we want to transform a content image to match the style of a second image, for example an artwork.

Typical examples of style transfer. As humans, we are trained to seamlessly disentangle style and content features.

Even though there is no general definition of style, as a human we can still easily extract attributes that are related to style, such as color composition, simple shapes or different drawing techniques. Even more, we are able to define abstract and more general features such as the atmosphere.

Traditionally, handcrafted style features have been used to transform images. Neural Style Transfer automates feature extraction and stylization.

The intriguing idea is to represent the style of an image by feature distributions in different layers of a pretrained CNN.

Basically, one defines an empirical measure on each convolutional layer by considering the channel-wise activations at each position of the feature maps as individual samples, thus naturally disregarding spatial positions. As we are using a pretrained CNN, the assumption is that shallow layers encode basic information such as colors and shape and deeper layers encode more abstract attributes of the style image.

The idea of representing style as feature distributions opened NST to the field of distribution alignment and has led to a plethora of works.

Current approaches and their bottlenecks

Current approaches can be categorized into three main categories: maximum mean discrepancy, moment matching and optimal transport. These are well-known mathematical concepts and widely used in distribution alignment, but we won’t go into explaining them in detail in this post. If you want to learn more about how they are defined, make sure to check out our paper.

While the idea of style as a feature distribution is fascinating, from a statistical perspective all three categories contradict the goal of optimally aligning feature distributions.

  • Methods based on MMD rely on simplistic kernels that are in particular non-characteristic [1, 2].
  • Similarly, variants of moment matching only consider the first and or the second moment [3].
  • Optimal transport is hobbled by high computation cost and currently only Gaussian approximations have been utilized in NST [4].
1D toy example: on the first image, you can see the source and target distribution. In the second one, the source distribution is being matched with Optimal Transport. Last but not least, we match the distributions with Maximum Mean Discrepancy. As you can see, both approaches do not fully match the distributions.

Central Moment Discrepancy for Neural Style Transfer

To overcome or at least reduce these limitations, we propose to define the style loss with the Central Moment Discrepancy [5]. CMD is an Integral Probability Metric (IPM) on a polynomial function space with a normalized coefficient vector and centralized moments. Its dual representation is defined by all higher order centralized moments.

It explicitly matches all higher order central moments which lead to natural geometric relations such as mean, variance, skewness or kurtosis. Even more, it allows for theoretically justified approximations, as the contribution of higher order moments converges to zero when the distributions are defined on compact intervals.

On the first image, we use 5 moments to match the distributions; on the second image, 50 moments are being matched explicitly.

As we can see from the above toy example, CMD can match distributions more faithfully, hence it can also better align the style distributions of two images and applying it to Neural Style Transfer is straightforward.

We simply match the higher order centralized moments of the empirical measures in each layer of a pretrained CNN. We only have to add e.g. a sigmoid function at the end of each layer to ensure compact support of the measures.

Example results

Let’s also have a look at some of our qualitative results. Below we can see four famous artworks of Kandinsky, Picasso, Van Gogh and Gerhard Richter and their portrait, respectively.

Now we style their portraits with their own works!

Conclusion

In this post, we have revisited the interpretation of Neural Style Transfer as aligning feature distributions in convolutional layers of a neural network. In this regard, existing methods only match first and second order moments. Contrarily, our method can be interpreted alternatively as minimizing an integral probability metric, or as matching all central moments up to a desired order and thus aligning the style distributions more faithfully.

If you want more details and theoretical backgrounds on existing approaches, make sure to check out our CVPR’21 paper and website at https://cmdnst.github.io/.

Kalischek, Nikolai, Jan D. Wegner, and Konrad Schindler. “In the light of feature distributions: moment matching for Neural Style Transfer.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

References

[1] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In CVPR, 2016.
[2] Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. Demystifying neural style transfer. arXiv preprint arXiv:1701.01036, 2017.
[3] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV,2017.
[4] Youssef Mroueh. Wasserstein style transfer. In AISTATS,2020.
[5] Werner Zellinger, Bernhard A Moser, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. Robust unsupervised domain adaptation forneural networks via moment alignment. Information Sciences, 483:174–191, 2019.

--

--