Insight on Style Attentional Networks (SANet) for Arbitrary Style Transfer
A robust approach for creating Art using AI
Image stylization is an image manipulation technique examined for several decades now and this article is intended to demonstrate highly efficient novel approach of style-attentional network (SANet) to synthesize high-quality stylized images while balancing the global and local style patterns and preserving content structure.
A high level overview on Style Transfer Mechanism
Ever visualized how a picture might look if crafted by a prominent artist ? Arbitrary Style Transfer turns this to reality by blending a content image(target image) with style image (image whose texture i.e. brush strokes, angular geometry, pattern, color transition etc. needs to be painted to the content image) to generate a third image that has never been seen before.
Novel SANet Approach for Style Transfer
The ultimate goal of arbitrary style transfer is to simultaneously achieve and preserve generalization, quality, and efficiency. Being an active area of research, much seminal work like [2], [3], [4], [5] has been proposed to achieve the goal but SANet proves to outstand when it comes to balancing the global and local style patterns and preserving content structure owing to :
- Use of a learned similarity kernel instead of a fixed one
- Use of soft attention based network instead of hard attention for style decoration
- Using identity loss during the training to maintain the content structure without losing the richness of the style
Building Blocks for Arbitrary Style Transfer using SANet
The entire style transfer mechanism can be summarized as follows :
Let’s go through the entire architecture by degrees to finally get a 360 degree overview.
The Out and Out SANet Architecture
Let’s attempt to untangle the full architecture as follows to have better insight
- Encoder Decoder Module
- Style Attentional Module
- Computation of Loss Function
Encoder-Decoder Module
The foremost step to accomplish the style transfer problem consists of the encoder-decoder mechanism. Pre-trained VGG-19 network encodes the image forming a representation and passed on to a decoder which tries to reconstruct the original input image back.
Style Attentional Module
SANet architecture takes inputs as the feature map from content and style images from VGG-19 encoder and after normalizing them, makes transformation to feature spaces in order to compute the attention between content and style feature maps.
Computation of Loss Function
Pre-trained VGG-19 is used to compute the loss function in order to train the decoder as follows :
An Inception on Computation of Content and Style Loss :
Computation of Identity Loss :
The SANet Architecture is capable of preserving the content structure as well as enriching the style patterns owing to its novel identity loss function.
Calculating loss from same input image without any style gap makes identity loss to achieve maintenance of content structure as well as style characteristics simultaneously.
Conclusion and Results
Experiments have evidently shown that the results obtained for Style Transfer using SANet parse diverse style patterns such as global color distribution, texture, and local style patterns while maintaining the structure of the content. Also SANet is prolific in distinguishing content structure and transfer style corresponding to each semantic content. It could be hence inferred that SANet is not only efficient but also effective in retaining content structure as well as easily blend style features enriching global as well as local style statistics.