Deep Learning as an Artistic Tool

Dan Baumann
6 min readJan 27, 2020

--

By Daniel Baumann

AI-produced artwork by French art collective Obvious

AI as a Creator

Artificial intelligence has capabilities beyond the typical tasks we associate with it, such as image classification. More recently, artificial intelligence has dived into creative activities. Francois Chollet, in 2014, claimed that most of the cultural content that we consume in the future will be created using some substantial help from AIs.

Like with any industry-defying comment such as Chollet’s, it was met with skepticism and brandished as near ridiculous. Not a year later, in 2015, the artistic capabilities of AI were coming into realisation with Google’s DeepDream algorithm turning images into visual artefacts induced in humans by disruption of the visual cortex via psychedelics. Interestingly, this process can be extended to speech as well as music. Music generated by neural networks may also play a large role in future consumption of these domains.

Following on from DeepDream, neural style transfer was also developed which allowed for cool image alterations based on the underlying style and content of images.

DeepDream

DeepDream uses convolutional neural networks (CNN) to modify images. The process is very similar to convnet filter-visualisation technique. DeepDream involves running a convnet in reverse. This is essentially running gradient ascent on the input to the convnet so that one can maximise the activation of a specific filter in an upper layer of the convnet.

However, there are a few distinctions that DeepDream implements on top of this idea:

  • Maximising activation of entire layers, rather than a specific filter (with this, we have a broader range of features which are emulated)
  • The process begins with a more complete image, rather than blank noisy input
  • The input image is processed at varying scale (or octaves), which improves the quality of visualisations

Creating a DeepDream Image

In this blog, I will be using Keras and a pre-trained convnet called Inception V3 to produce some cool-looking visualisations. Code is largely taken from François Chollet’s seminal book on Deep Learning (Deep Learning with Python)

  1. Importing the necessary libraries and instantiating the model
from keras.applications import inception_v3
from keras import backend as K
K.set_learning_phase(0)
#this line ensures that we aren't training a model
model = inception_v3.InceptionV3(weights = 'imagenet',
include_top = False
)

2. Creating a loss function

  • We maximise this quantity during the gradient-ascent process
  • Maximise activation of all filters in a number of layers, more specifically using the weighted sum of the L2 normalisation of the activations of a set of high-level layers
  • Create a dictionary that maps layer names to layer instances
  • Loss is defined by adding layer contributions to the scalar variable loss
  • Add L2 normalisation of features
# starting with a somewhat arbitrary configuration involving four layerslayer_contributions = {'mixed2':0.2, 'mixed3':3.
'mixed4:2., 'mixed5':1.5
}
layer_dict = dict([(layer.name, layer) for layer in model.layers])loss = K.variable(0.)
for layer_name in layer_contributions:
coeff = layer_contributions[layer_name]
activation = layer_dict[layer_name].output

scaling = K.prod(K.cast(K.shape(activation), 'float32'))
loss += coeff * K.sum(K.square(activation[:, 2: -2, -2, :])) /
scaling

3. Gradient-ascent Process

  • Create a tensor to hold the generated image, and compute the gradients of dreams with regard to loss (as well as normalisation)
  • Setting up a Keras function to retrieve the value of loss and gradient
  • With these values, apply gradient descent
dream = model.input
grads = K.gradients(loss, dream)[0]
grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
def gradient_ascent(x, iterations, step, max_loss=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
break
x += step * grad_values
return x

4. Running Gradient Descent over Different Successive Scales

  • Creating a step size, number of scales (octaves) at which to run gradient ascent, choosing an octave scale and iterations
  • Setting maximum loss at 10, which will interrupt gradient ascent if reached
  • Using some auxiliary functions
import numpy as npstep = 0.01
num_octave = 3
octave_scale = 1.4
iterations = 20
max_loss = 10.base_image_path = '/Users/danielbaumann/image.png
img = preprocess_image(base_image_path)
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range (1, num_octave):
shape = tuple([int(dim/(octave_scale**i)) for dim in
original_shape
])
successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:
img = resize_img(img, shape)
img = gradient_ascent(img, iterations=iterations
step=step, max_loss=max_loss
)
upscaled = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname='dream_at_scale_'+str(shape)+'.png'
save_img(img, fname='final_dream.png')

And the result:

DeepDream Elephant

Neural Style Transfer

In addition to DeepMind, neural style transfer (introduced by Leon Gatys et al.) allows us to generate images using the distinction of style and content.

We typically think of STYLE as textures, colours and visual patterns

Whereas, CONTENT , means a higher-level overall structure to the image

A landscape of London (left) and Edvard Munch’s Scream (right)

The famous on the right can be broken down into these two main principles:

  • The brushstrokes and colour represent the style
  • The setting of the man on a bridge with a darkened sunset is the content

The landscape of London can also be broken down into style and content with the general setting describing the content.

The aim of neural style transfer is the following:

Preserve original content, while adopting the style of a reference image

We could define the loss as the following:

loss = distance(style(reference_image) - style(generated_image) +
distance(content(original_image) - content(generated_image))

Content Loss and Style Loss

Higher layers of a convnets contain increasingly global and abstract information. So we expect the content of an image to be captured by these representations

Style loss uses multiple layers of a convnet, so we capture the appearance of the style-reference image at multiple spatial scales extracted by the convnet

How Neural Style Transfer Works

Gatys et al. adopts a pretrained convnet called VGG19 which is saved in the Keras library. Once this is imported the process is fairly simple.

  1. Set up a network that computes VGG19 layer activations for the style-reference image as well as the generated image at the same time
  2. Use layer activations computed over these three images to define the loss function, which we want minimise
  3. Run a gradient-descent process to minimise the loss function
London Screams :D

Summing Up

Deep learning and its methodology is not limited to what many would perceive to be arduous tasks. It also has vast capabilities that extend into the creative domains.

However the process remains largely the same. A well-defined loss function will allow you to produce images such as these with minimal computing power.

References

Google DeepDream, 2015 (https://en.wikipedia.org/wiki/DeepDream)

François Chollet, Deep Learning with Python, 2017 (https://www.manning.com/books/deep-learning-with-python)

--

--