Insight on Style Attentional Networks (SANet) for Arbitrary Style Transfer

A robust approach for creating Art using AI

dhwani mehta
VisionWizard
4 min readOct 11, 2020

--

Image stylization is an image manipulation technique examined for several decades now and this article is intended to demonstrate highly efficient novel approach of style-attentional network (SANet) to synthesize high-quality stylized images while balancing the global and local style patterns and preserving content structure.

A high level overview on Style Transfer Mechanism

Figure 1 : Top-Left: Content image, Bottom Left: Style Image, Right: Image obtained by style transfer upon the content image [image source]

Ever visualized how a picture might look if crafted by a prominent artist ? Arbitrary Style Transfer turns this to reality by blending a content image(target image) with style image (image whose texture i.e. brush strokes, angular geometry, pattern, color transition etc. needs to be painted to the content image) to generate a third image that has never been seen before.

Novel SANet Approach for Style Transfer

The ultimate goal of arbitrary style transfer is to simultaneously achieve and preserve generalization, quality, and efficiency. Being an active area of research, much seminal work like [2], [3], [4], [5] has been proposed to achieve the goal but SANet proves to outstand when it comes to balancing the global and local style patterns and preserving content structure owing to :

  1. Use of a learned similarity kernel instead of a fixed one
  2. Use of soft attention based network instead of hard attention for style decoration
  3. Using identity loss during the training to maintain the content structure without losing the richness of the style

Building Blocks for Arbitrary Style Transfer using SANet

The entire style transfer mechanism can be summarized as follows :

Figure 2 : Step by step process of Arbitrary Style Transfer using SANet Architecture

Let’s go through the entire architecture by degrees to finally get a 360 degree overview.

The Out and Out SANet Architecture

Figure 3 : Overview of training flow. (a) Fixed VGG encoder encoding content and style images. Two SANets map features from Relu 4 1 and Relu 5 1 features respectively. The decoder transforms the combined SANet output features to Ics. The fixed VGG encoder is used to compute Lc and Ls (b) The identity loss Lidentity quantifies the difference between Ic and Icc or between Is and Iss, where Ic (Is) is the original content (style) image and Icc (Iss) is the output image synthesized from the image pair (content or style). [image source]

Let’s attempt to untangle the full architecture as follows to have better insight

  • Encoder Decoder Module
  • Style Attentional Module
  • Computation of Loss Function

Encoder-Decoder Module

Figure 4 : [image source]

The foremost step to accomplish the style transfer problem consists of the encoder-decoder mechanism. Pre-trained VGG-19 network encodes the image forming a representation and passed on to a decoder which tries to reconstruct the original input image back.

Style Attentional Module

Equation Table 1 : SANet Function Input and Output parameters

SANet architecture takes inputs as the feature map from content and style images from VGG-19 encoder and after normalizing them, makes transformation to feature spaces in order to compute the attention between content and style feature maps.

Equation Table 2 : On the inside of SANet function

Computation of Loss Function

Pre-trained VGG-19 is used to compute the loss function in order to train the decoder as follows :

Equation Table 3 : Complete Loss Computation Equation

An Inception on Computation of Content and Style Loss :

Figure 5 : An Overview on computation of content and style loss component in SANet

Computation of Identity Loss :

The SANet Architecture is capable of preserving the content structure as well as enriching the style patterns owing to its novel identity loss function.

Figure 6 : An Overview on computation of identity loss in SANet

Calculating loss from same input image without any style gap makes identity loss to achieve maintenance of content structure as well as style characteristics simultaneously.

Conclusion and Results

Figure 7 : User preference result of five style transfer algorithms [image source]

Experiments have evidently shown that the results obtained for Style Transfer using SANet parse diverse style patterns such as global color distribution, texture, and local style patterns while maintaining the structure of the content. Also SANet is prolific in distinguishing content structure and transfer style corresponding to each semantic content. It could be hence inferred that SANet is not only efficient but also effective in retaining content structure as well as easily blend style features enriching global as well as local style statistics.

Figure 8 : Experimental results for comparison of SANet over other style transfer mechanisms [image source]

References

[1] Park, Dae Young, and Kwang Hee Lee. “Arbitrary style transfer with style-attentional networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

[2] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “Image style transfer using convolutional neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[3] Huang, Xun, and Serge Belongie. “Arbitrary style transfer in real-time with adaptive instance normalization.” Proceedings of the IEEE International Conference on Computer Vision. 2017.

[4] Li, Yijun, et al. “Universal style transfer via feature transforms.” Advances in neural information processing systems. 2017.

[5] Sheng, Lu, et al. “Avatar-net: Multi-scale zero-shot style transfer by feature decoration.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

--

--

dhwani mehta
VisionWizard

Machine Learning | Data Scientist | Founder @clique_org