Art & AI: The Logic Behind Deep Learning ‘Style Transfer’

A Look Into the ’Neural Algorithm of Artistic Style’

What is ‘Style Transfer’?

In this example, I transferred the style from one of the Beatles’ iconic album covers to an old photo. Can you guess which album was used?
A friendly llama depicted in the four different styles available from the MAX ‘Fast Neural Style Transfer’ model I tested with.

You CAN Teach an Old Model New Tricks

Wading Into the Deep End

The architecture of the VGG-19 Convolutional Neural Network
The representation of this cat photo becomes increasingly hard to recognize as it moves through the network.

A Neural Algorithm of Artistic Style

  • To get the ‘content’ from our content image, we’ll need to extract the representation of our image from just the right spot in our network. The authors have identified the output from the 2nd layer of the 4th convolutional stack as the perfect place to do this.
  • To get the ‘style’ from our style image, we take a similar approach and collect our image’s data as it leaves the first convolutional layer in each stack. This way, we’re getting a nice cross-section of the nuances contained in an image, ranging from things that may be obvious to the naked eye all the way to subtle patterns that a human might never pick up.
I experimented with different levels of ‘Simpsonification’ on Shorty, my Boston Terrier.
  • Instead of minimizing the loss between ‘predicted’ and ‘actual’ output, like we do in a typical image classification problem, similar equations are used to minimize the ‘content’ loss, which is defined as the difference between our content and target images at the specified ‘content’ point in the network. This is what keeps our generated image looking similar to the content image.
  • To minimize ‘style’ loss, we compare the style data from each ‘style’ layer in our target image to the extracted style from the corresponding layer from our style image. These values are stored in structures known as Gram Matrices, which are useful when working with image feature maps like we are here.
  • Finally, these two values are multiplied by their individual weights to form a ‘total loss’ value, which is then minimized with back-propagation and traditional optimization functions. These content and style weights are among the many parameters that can be tuned to produce different results and output images, although in the paper they suggest values near 1 for content, and 10^-6 for style to maintain the right balance.
Bob Ross paintings are great, but they get even cooler when reworked in the style of the artist Skinner!

Conclusion

Thanks for reading! If you know of any good Art & AI projects that I’ve missed, please share in the comments!

--

--

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store