Neural Style Transfer: Creating Art with Deep Learning using tf.keras and eager execution

TensorFlow
Aug 3, 2018 · 10 min read

By Raymond Yuan, Software Engineering Intern

In this tutorial, we will learn how to use deep learning to compose images in the style of another image (ever wish you could paint like Picasso or Van Gogh?). This is known as neural style transfer! This is a technique outlined in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style, which is a great read, and you should definitely check it out.

Neural style transfer is an optimization technique used to take three images, a content image, a style reference image (such as an artwork by a famous painter), and the input image you want to style — and blend them together such that the input image is transformed to look like the content image, but “painted” in the style of the style image.

For example, let’s take an image of this turtle and Katsushika Hokusai’s The Great Wave off Kanagawa:

Image for post
Image for post
Image of Green Sea Turtle by P. Lindgren, from Wikimedia Commons

Now how would it look like if Hokusai decided to add the texture or style of his waves to the image of the turtle? Something like this?

Image for post
Image for post

Is this magic or just deep learning? Fortunately, this doesn’t involve any magic: style transfer is a fun and interesting technique that showcases the capabilities and internal representations of neural networks.

The principle of neural style transfer is to define two distance functions, one that describes how different the content of two images are, Lcontent, and one that describes the difference between the two images in terms of their style, Lstyle. Then, given three images, a desired style image, a desired content image, and the input image (initialized with the content image), we try to transform the input image to minimize the content distance with the content image and its style distance with the style image.

In summary, we’ll take the base input image, a content image that we want to match, and the style image that we want to match. We’ll transform the base input image by minimizing the content and style distances (losses) with backpropagation, creating an image that matches the content of the content image and the style of the style image.

Specific concepts that will be covered:

  • Eager Execution — use TensorFlow’s imperative programming environment that evaluates operations immediately
  • Learn more about eager execution
  • See it in action (many of the tutorials are runnable in Colaboratory)
  • Using Functional API to define a model — we’ll build a subset of our model that will give us access to the necessary intermediate activations using the Functional API
  • Leveraging feature maps of a pretrained model — Learn how to use pretrained models and their feature maps
  • Create custom training loops — we’ll examine how to set up an optimizer to minimize a given loss with respect to input parameters

We will follow the general steps to perform style transfer:

  1. Basic Preprocessing/preparing our data
  2. Set up loss functions
  3. Create model
  4. Optimize for loss function

Audience: This post is geared towards intermediate users who are comfortable with basic machine learning concepts. To get the most out of this post, you should:

Time Estimated: 60 min

Code:

Implementation

Image for post
Image for post
Image of Green Sea Turtle -By P .Lindgren from Wikimedia Commons and Image of The Great Wave Off Kanagawa from by Katsushika Hokusai Public Domain

Define content and style representations

Why intermediate layers?

Specifically we’ll pull out these intermediate layers from our network:

Model

We use VGG19, as suggested in the paper. In addition, since VGG19 is a relatively simple model (compared with ResNet, Inception, etc) the feature maps actually work better for style transfer.

In order to access the intermediate layers corresponding to our style and content feature maps, we get the corresponding outputs by using the Keras Functional API to define our model with the desired output activations.

With the Functional API, defining a model simply involves defining the input and output: model = Model(inputs, outputs).

In the above code snippet, we’ll load our pretrained image classification network. Then we grab the layers of interest as we defined earlier. Then we define a Model by setting the model’s inputs to an image and the outputs to the outputs of the style and content layers. In other words, we created a model that will take an input image and output the content and style intermediate layers!

Define and create our loss functions (content and style distances)

Content Loss:

More formally, content loss is a function that describes the distance of content from our input image x and our content image, p . Let Cₙₙ be a pre-trained deep convolutional neural network. Again, in this case we use VGG19. Let X be any image, then Cₙₙ(x) is the network fed by X. Let Fˡᵢⱼ(x)∈ Cₙₙ(x)and Pˡᵢⱼ(x) ∈ Cₙₙ(x) describe the respective intermediate feature representation of the network with inputs x and p at layer l . Then we describe the content distance (loss) formally as:

Image for post
Image for post

We perform backpropagation in the usual way such that we minimize this content loss. We thus change the initial image until it generates a similar response in a certain layer (defined in content_layer) as the original content image.

This can be implemented quite simply. Again it will take as input the feature maps at a layer L in a network fed by x, our input image, and p, our content image, and return the content distance.

Style Loss:

Mathematically, we describe the style loss of the base input image, x, and the style image, a, as the distance between the style representation (the gram matrices) of these images. We describe the style representation of an image as the correlation between different filter responses given by the Gram matrix Gˡ, where Gˡᵢⱼ is the inner product between the vectorized feature map i and j in layer l. We can see that Gˡᵢⱼ generated over the feature map for a given image represents the correlation between feature maps i and j.

To generate a style for our base input image, we perform gradient descent from the content image to transform it into an image that matches the style representation of the original image. We do so by minimizing the mean squared distance between the feature correlation map of the style image and the input image. The contribution of each layer to the total style loss is described by

Image for post
Image for post

where Gˡᵢⱼ and Aˡᵢⱼ are the respective style representation in layer l of input image x and style image a. Nl describes the number of feature maps, each of size Ml=height∗width. Thus, the total style loss across each layer is

Image for post
Image for post

where we weight the contribution of each layer’s loss by some factor wl. In our case, we weight each layer equally:

Image for post
Image for post

This is implemented simply:

Run Gradient Descent

In this case, we use the Adam optimizer in order to minimize our loss. We iteratively update our output image such that it minimizes our loss: we don’t update the weights associated with our network, but instead we train our input image to minimize loss. In order to do this, we must know how we calculate our loss and gradients. Note that the L-BFGS optimizer, which if you are familiar with this algorithm is recommended, but isn’t used in this tutorial because a primary motivation behind this tutorial was to illustrate best practices with eager execution. By using Adam, we can demonstrate the autograd/gradient tape functionality with custom training loops.

Compute the loss and gradients

Here we use tf.GradientTape to compute the gradient. It allows us to take advantage of the automatic differentiation available by tracing operations for computing the gradient later. It records the operations during the forward pass and then is able to compute the gradient of our loss function with respect to our input image for the backwards pass.

Then computing the gradients is easy:

Apply and run the style transfer process

And that’s it!

Let’s run it on our image of the turtle and Hokusai’s The Great Wave off Kanagawa:

Image for post
Image for post
Image for post
Image for post
Image of Green Sea Turtle by P.Lindgren [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Common

Watch the iterative process over time:

Image for post
Image for post

Here are some other cool examples of what neural style transfer can do. Check it out!

Image for post
Image for post
Image of Tuebingen — Photo By: Andreas Praefcke [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons and Image of Starry Night by Vincent van Gogh Public domain
Image for post
Image for post
Image of Tuebingen — Photo By: Andreas Praefcke [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons and Image of Composition 7 by Vassily Kandinsky, Public Domain
Image for post
Image for post
Image of Tuebingen — Photo By: Andreas Praefcke [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons and Image of Pillars of Creation by NASA, ESA, and the Hubble Heritage Team, Public Domain

Try out your own images!

Key Takeaways

What we covered:

  • In order to do this, we loaded in a pretrained model and used its learned feature maps to describe the content and style representation of our images.
  • Our main loss functions were primarily computing the distance in terms of these different representations.
  • We implemented this with a custom model and eager execution.
  • We built our custom model with the Functional API.
  • Eager execution allows us to dynamically work with tensors, using a natural python control flow.
  • We manipulated tensors directly, which makes debugging and working with tensors easier.

We iteratively updated our image by applying our optimizers update rules using tf.gradient. The optimizer minimized the given losses with respect to our input image.

TensorFlow

TensorFlow is an end-to-end open source platform for…

TensorFlow

Written by

TensorFlow is a fast, flexible, and scalable open-source machine learning library for research and production.

TensorFlow

TensorFlow is an end-to-end open source platform for machine learning.

TensorFlow

Written by

TensorFlow is a fast, flexible, and scalable open-source machine learning library for research and production.

TensorFlow

TensorFlow is an end-to-end open source platform for machine learning.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store