Introduction to Style transfer

Sahil Singla
2 min readAug 28, 2017

--

Caveat: Some knowledge of convolutional neural networks is assumed.

Short Introduction to style transfer and how it works?

Q. What is style transfer?

Style transfer is what apps like prisma, lucid and microsoft’s recently released pix are doing. It creates the original image painted in the style of some other image (usually some famous painting).

Here is an example of style transfer:

Content image (left), Style image (right)
Stylized image generated by style transfer

Q. How it works?

You take a CNN pretrained on Imagenet, we call this loss network. Another neural network we are training generates an output image given a content image and a style image as inputs.

Experimentally, it is known that outputs of the first layers in CNNs correspond to the style of the painter — brush strokes; how they strokes form objects; texture of objects etc. Outputs in later layers happen to locate and recognize major objects in the image.

By passing a style image through a CNN, and noticing correlations between outputs of different filters (called gram matrix) in some layer, we can obtain a representation of the style used in it. By calculating the difference between this and the gram matrix of output image, we can get style loss (which denotes how different style of generated image is compared to the input style image).

Similarly, by passing the content image and output image through a CNN and calculating the difference between outputs of layers, we can get content loss (which denotes how different content of output image is from the input content image).

You multiply content loss with content weight (a scalar that defines how much content you want to keep), style loss with style weight(a scalar that defines how much style you want to keep) and add them to create the total loss.

Using these two losses you just train your neural net (like you train any neural net)!

Alternatively, if you want to obtain the stylized representation of a single image and not train a neural network that generates a stylized image, you can do it by passing the image through a loss network and optimizing the image pixels to minimize the loss.

--

--