Style Transfer using Deep Neural Network and PyTorch

Published in

Udacity PyTorch Challengers

6 min readDec 17, 2018

Introduction

Nowadays everyone is excited about doing projects using machine learning or deep learning. Through this blog, I will give you a chance to be ‘ Picasso ‘ of deep learning as we are going to explore the method of style transfer using Deep Convolutional Neural Networks. we will use pre-trained network VGG19 for that.

Don’t worry, it just sounds tough but actually way easy. You just need to be familiar with Python, PyTorch and some Deep Learning stuff such as CNN and how to use pre-trained networks (as we will be using a pre-trained CNN for our style transfer). I will brush up your concepts about CNN.

What is CNN( Convolutional Neural Network)

A convolutional layer + activation function, followed by a pooling layer, and a linear layer (to create the desired output size) make up the basic layers of a CNN.

Convolutional layer: It is obtained by applying various filters on input image for feature extraction. The number of filters used defines the depth of the layer. Activation function is applied to output of the convolutional layer for scaling.
Pooling layer: A max pooling layer reduces the x-y size of input and only keeps the most active pixel values. Below is an example of a 2x2 pooling kernel, with a stride of 2.

Why we are using VGG19 for Style Transfer

We can use either of VGG16 and VGG19 for feature extraction as they are performing very well as compared to others in case of style transfer. For example, here I have used VGG19.

Other than VGG, you can use SqueezeNet, it is faster but results are worst and in case of Inception, it performs well but you have to change striding/kernels, max pooling to average pooling, search over various layer combos. So VGG is best at the moment.

You can read more here

How to perform Style Transfer

Style transfer uses the features found in the 19-layer VGG Network, which is comprised of a series of convolutional and pooling layers, and a few fully-connected layers. Convolutional layers are named by the stack and their order in the stack. As first convolutional layer is named as conv1_1 and the deepest convolutional layer is conv5_4.

Style transfer relies on separating content and style of an image. Our target is to create a new image containing style of style image and content of content image( base image).

Content( objects and their arrangement) from the given content image.
Style( colour and texture) from given style image.

Load in VGG19 (features):

VGG19 is consists of two parts:

vgg19.features (convolutional and pooling layer)
vgg19.classifier ( last three layers for output)

For style transfer we need only features portion ,so we will load in that and freeze the weights.

import matplotlib.pyplot as plt
import numpy as npimport torch
import torch.optim as optim
from torchvision import transforms, modelsvgg = models.vgg19(pretrained=True).features# freeze all VGG parameters since we’re only optimizing the target image
for param in vgg.parameters():
 param.requires_grad_(False)

Load style and content image:

# define load_image() function which deals with images size
# load in content and style image 
content = load_image('images/x.jpg')
# Resize style to match content, makes code easier
style = load_image('images/y.jpg',shape=content.shape[-2:])
#covert this images to tensor and apply transforms

Content and style feature:

For content representation of target image we pass the content image through model and take the output of conv4_2 layer, as it is considered to be containing most accurate content features.
For Style representation of target image, we consider the outputs of conv1_1, conv2_1,conv3_1,conv4_1, and conv5_1 layers, again this for the same reason containing accurate style features.

We will then calculate the gram matrix for output of each convolutional layer used for style feature extraction, to find a co-relation between any spatial information.

Gram matrix is calculated by multiplying a matrix by its transpose.

Creating our canvas and putting all together:

# define get_feature() and get content and style features only once before forming the target image
content_features = get_features(content, vgg)
style_features = get_features(style, vgg)# calculate the gram matrices for each layer of our style representation
style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}# create a third "target" image and prep it for change
# it is a good idea to start of with the target as a copy of our *content* image
# then iteratively change its style
target = content.clone().requires_grad_(True)

Loss and Weight :

Individual layer style weight

We assign weights to the outputs of each layer to control their style effect on our final image.If u want larger style artifacts than you should give higher weights to initial layers conv1_1, conv2_1 and vice versa. Weights are in the range of 0–1.

Content and Style weight

We define an alpha (content_weight) and a beta (style_weight). This ratio will affect how stylized your final image is. It’s recommended to keep content_weight as 1 and change style_weight.

# weights for each style layer 
# weighting earlier layers more will result in *larger* style artifacts
# notice we are excluding `conv4_2` our content representation
style_weights = {'conv1_1': 1.,
                 'conv2_1': 0.75,
                 'conv3_1': 0.2,
                 'conv4_1': 0.2,
                 'conv5_1': 0.2}content_weight = 1  # alpha
style_weight = 1e6  # beta

Content loss

It is mean squared difference between target and content features at layer conv4_2.

content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)

Style loss

For style loss we will calculate gram matrix of target image and than will compare it with the gram matrix of style image at layers used for style feature extraction as conv1_1 , conv2_1 etc. Again it is mean squared difference.

Total Loss

It will be calculated by adding style and content loss after weighting them with alpha and beta.

total_loss = content_weight * content_loss + style_weight * style_loss

Our aim here is to minimize the total loss by iterating and updating the values.

# for displaying the target image, intermittently
show_every = 400# iteration hyper parameters
optimizer = optim.Adam([target], lr =0.003)
steps = 2000  # decide how many iterations to update your image (5000)for ii in range(1, steps+1):
    
    # get the features from your target image
    target_features = get_features(target, vgg)
    
    # the content loss
    content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)
    
    # the style loss
    # initialize the style loss to 0
    style_loss = 0
    # then add to it for each layer's gram matrix loss
    for layer in style_weights:
        # get the "target" style representation for the layer
        target_feature = target_features[layer]
        target_gram = gram_matrix(target_feature)
        _, d, h, w = target_feature.shape
        # get the "style" style representation
        style_gram = style_grams[layer]
        # the style loss for one layer, weighted appropriately
        layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)
        # add to the style loss
        style_loss += layer_style_loss / (d * h * w)
        
    # calculate the *total* loss
    total_loss = content_weight * content_loss + style_weight * style_loss
    
    # update your target image
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

You can find complete code for style transfer here

Conclusion

By reading this blog, you will get an overview about how style transfer happens and picture editing apps like Prisma works. Here we used gram matrix calculation but you can also improve your style transfer by using various other approaches such as encoder and decoder networks etc.

Moreover, the major drawback of this technique is we are paying in terms of time for better results, you can also search for real-time style transfer as an update on the existing one.