Insight on Style Attentional Networks (SANet) for Arbitrary Style Transfer

A robust approach for creating Art using AI

Published in

VisionWizard

4 min readOct 11, 2020

Photo by Ambitious Creative Co. — Rick Barrett on Unsplash

Image stylization is an image manipulation technique examined for several decades now and this article is intended to demonstrate highly efficient novel approach of style-attentional network (SANet) to synthesize high-quality stylized images while balancing the global and local style patterns and preserving content structure.

A high level overview on Style Transfer Mechanism

Figure 1 : Top-Left: Content image, Bottom Left: Style Image, Right: Image obtained by style transfer upon the content image [image source]

Ever visualized how a picture might look if crafted by a prominent artist ? Arbitrary Style Transfer turns this to reality by blending a content image(target image) with style image (image whose texture i.e. brush strokes, angular geometry, pattern, color transition etc. needs to be painted to the content image) to generate a third image that has never been seen before.

Novel SANet Approach for Style Transfer

The ultimate goal of arbitrary style transfer is to simultaneously achieve and preserve generalization, quality, and efficiency. Being an active area of research, much seminal work like [2], [3], [4], [5] has been proposed to achieve the goal but SANet proves to outstand when it comes to balancing the global and local style patterns and preserving content structure owing to :

Use of a learned similarity kernel instead of a fixed one
Use of soft attention based network instead of hard attention for style decoration
Using identity loss during the training to maintain the content structure without losing the richness of the style

Building Blocks for Arbitrary Style Transfer using SANet

The entire style transfer mechanism can be summarized as follows :

Figure 2 : Step by step process of Arbitrary Style Transfer using SANet Architecture

Let’s go through the entire architecture by degrees to finally get a 360 degree overview.

The Out and Out SANet Architecture

Figure 3 : Overview of training flow. (a) Fixed VGG encoder encoding content and style images. Two SANets map features from Relu 4 1 and Relu 5 1 features respectively. The decoder transforms the combined SANet output features to Ics. The fixed VGG encoder is used to compute Lc and Ls (b) The identity loss Lidentity quantifies the difference between Ic and Icc or between Is and Iss, where Ic (Is) is the original content (style) image and Icc (Iss) is the output image synthesized from the image pair (content or style). [image source]

Let’s attempt to untangle the full architecture as follows to have better insight

Encoder Decoder Module
Style Attentional Module
Computation of Loss Function

Encoder-Decoder Module

The foremost step to accomplish the style transfer problem consists of the encoder-decoder mechanism. Pre-trained VGG-19 network encodes the image forming a representation and passed on to a decoder which tries to reconstruct the original input image back.

Style Attentional Module

Equation Table 1 : SANet Function Input and Output parameters

SANet architecture takes inputs as the feature map from content and style images from VGG-19 encoder and after normalizing them, makes transformation to feature spaces in order to compute the attention between content and style feature maps.

Equation Table 2 : On the inside of SANet function

Computation of Loss Function

Pre-trained VGG-19 is used to compute the loss function in order to train the decoder as follows :

Equation Table 3 : Complete Loss Computation Equation

An Inception on Computation of Content and Style Loss :

Figure 5 : An Overview on computation of content and style loss component in SANet

Computation of Identity Loss :

The SANet Architecture is capable of preserving the content structure as well as enriching the style patterns owing to its novel identity loss function.

Calculating loss from same input image without any style gap makes identity loss to achieve maintenance of content structure as well as style characteristics simultaneously.

Conclusion and Results

Figure 7 : User preference result of five style transfer algorithms [image source]

Experiments have evidently shown that the results obtained for Style Transfer using SANet parse diverse style patterns such as global color distribution, texture, and local style patterns while maintaining the structure of the content. Also SANet is prolific in distinguishing content structure and transfer style corresponding to each semantic content. It could be hence inferred that SANet is not only efficient but also effective in retaining content structure as well as easily blend style features enriching global as well as local style statistics.