Style Transfer with Deep Learning

Implementation with Pytorch

Published in

Analytics Vidhya

9 min readNov 15, 2019

Source:**Style Tranfer with Deep Learning**

Most of us are very much familiar with editing software like Adobe Photoshop, Coral draw and what not. These software do a great job in editing the pictures especially mixing various backgrounds and rendering a image at once. Much of the hard work goes around to make a software as such.

But what if we could do the same thing using deep learning?

The answer to the question is yes! we can do it. All we need is a stable coding background somewhat like intermediate and basic understanding of linear algebra along with some research papers which help you to attain your goal.

The technique of deep learning we will be using is known as transfer learning.

What is it?

Well, transfer learning is a kind of machine learning technique where we use the existing trained model to meet the requirement of our needs. It is like borrowing a friends car to get your job done.

Why?

Because of variety of reasons:

If we are not able to produce a good model for our specific requirements after lot of trials and errors.
We want a to develop a quick model which will meet our requirements.
What if the model of our particular requirements is already out there, all we need to do is to get the require data and train the model with the same. We don’t want to re-invent the wheel again.

This approach is used popularly in:

Computer Vision
Natuaral Language processing

My job is to make this article as simple as possible for you to understand.

Getting a research paper

Our first requirements will be to find a research paper. Now, finding a research paper is a tedious job and we have to be very patient to get one. Make sure that you enter the exact keywords in your search engine and browse through the results displayed to you.

arxiv.org is one of the best website where you can find research papers for free. A good thumb rule is to download and collect a bunch of random pdfs files and skim through all of them. Look for papers which has explained the topic thoroughly. Also a side note is to jot down points in a piece of paper, copy and paste or even take screen shots to save the information that you find valuable.

One key thing to notice is that research papers will always drive you to search keywords that you are coming across for the first time. So one advise is have a split screen that will help you to navigate easily across different windows.

The research paper that I will be using will be Image Style Transfer Using Convolutional Neural Networks. You can click on the link to download for yourself. It is open source.

Next thing read the Abstract and then the Conclusion. I know it sounds weird but that is how it is done. You don’t want to waste your time on the paper that does not offer you concrete details — or answer per say — of the topic that you are looking for.

Once you are done with the same skim through the entire paper and highlight the noticeable keywords and formula that you come across.

Don’t worry about the information that you weren’t able to process. Once you start working on the code you will practically see and observe the little details that you missed while skimming through.

Once done we can start our coding. But before that we should know what Style Transfer with Deep Learning is.

Style Transfer with Deep Learning

Style transfer method that is outlined in the paper that I already mentioned above.

I will try to break down each and every steps for a more intuitive knowledge. I will show you how you can break a paper into its fundamental constituents.

Well, our approach will be all be modifying a CNN architecture.

So what is a style transfer?

Style transfer is a method of modifying a picture by adopting a style from some other image. We are already using this technique in softwares like photoshop, coral and what not.

Content Image (left); Style Image (centre); Target Image (right)

The idea and inspiration behind this project is to use neural nets to capture some of the minor details from the style image and deploy it on the orginal image.

For that purpose we use Convolutional neural nets. Why? Because, CNN contains layers and each of those layers acts like a filter which allows us to extract features required for the final results or rendering.

I will use a pre-trained VGG19 Net to extract content or style features from the image passed in. Then formalize the idea of content and style losses and use those to iteratively update our target image until we get a result that we want.

Let’s Code

First things first. Make sure that your environment is ready.

I will be using google colab for my code editing — cause it gives me a GPU option which renders code faster.

Importing

We will import our dependencies. Since we will be using pytorch as the base library we need to pip install torch and torchvision followed by importing it into our notebook.

From torchvision.models we will import our vgg19 CNN model.

VGG19 is split into two portions:

vgg19.features — refers to all the convolutional and pooling layers
vgg19.classifier — refers to linear layers: Dense and Linear Layers

We will be using features only so that we can extract the styling details and make my way to optimise the output image. For that very reason we are not using the linear layers. Our idea is that we will be optimising the output image which also lead us to freeze the optimisation for all the layers.

Note: We will not optimise any parameter because of the reasons mentioned above.

Source: **Image Style Transfer Using Convolutional Neural Networks**

The diagram above tells that the Content image, a— the image that will be modified — and the style image , p— which will be used to style the content image — will be passed through a CNN — in our case VGG19. The CNN will filter out the patterns from each of those images and then use an empty image to render out the patterns it found while processing the two images.

Imagine this as a painter who paints an empty canvas with scenery in front of his eyes as a reference added with the strokes and colors of his paint brush.

Making helper function

We will then write a function to load the image. This function will convert the pixel value into a tensor which can be then fed into the network or any other operations before feeding into the network.

While making a function we have to keep in mind that all the image should be of the same dimension.

We should also note that we have to normalise the image — ie. converting the pixel values between a range of two values — before feeding it into the network or anything other operation.

Next, we will visualise the image.

We will make a function to do that as well. Now, when we are using matplotlib for any visualisation, we have to bear in mind that it works only on numpy array. If your data is not a numpy array then it will raise an error.

Also if the shape of the image is not (width*height*channels) it will raise an error again. To achieve the required formatting we can use the transpose function.

Grabbing content and style features

According to the paper, we have to isolate certain layers for content and style representation.

Content representation will be be synthesised on ‘conv4_2’ whereas style representation we will synthesised on conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1.

To find all the layers in our CNN model we can use a for loop to iterate through the layers or we can just print the variable in which our vgg19 has stored.

Once we get the output we can then make functions for both content and style representation that will synthesise the image for us.

The feature function job is to pass the image into the particularly selected CNN layer leaving the rest untouched.

Gram Matrix

Gram matrix or gramian matrix is a set of vectors {v1, v2, v3, v4, vn} in an inner product space whose entries are given by Gij = < vi,vj >.

Which tells us that it is a square matrix of v. ie. we can compute v*v.T

An important application of gramian is to compute linear independence: a set of vectors are linearly independent if and only if the Gram determinant (the determinant of the Gram matrix) is non-zero. In other words grammian help to find the correlation between different filter in the CNN.

The best way to get style loss is to measure the gram matrix.

Optimisation and Weights

Optimisation is all about improving the model by decreasing the error and increasing the accuracy. Read the detail version of optimisation in the article Understanding gradient descent that I have written recently.

We have also initialise random weights for each style layer that we selected which will be operated into the mean squared error of the target and feature gram matrix.

Once we have completed all the required coding we can then move on to the last part, ie. optimisation using for loop.

But before we put everything in the loop we have to initialise the optimiser that we will be using, in our case Adam and we will be optimising the target image or the empty canvas.

Once we have defined everything that we can start our for loop and put all our loss calculation inside the same — which includes:

Content loss
Style loss
total loss

We will using L2 norm or mean squared error as mentioned in the paper.

The total loss will contain the alpha and beta that we defined along with weights. These alpha and beta are the weighing factors that determines the blend of the content and style images to get an output image.

Given below are some of the images that I have generated: