Deep Learning as an Artistic Tool
By Daniel Baumann
AI as a Creator
Artificial intelligence has capabilities beyond the typical tasks we associate with it, such as image classification. More recently, artificial intelligence has dived into creative activities. Francois Chollet, in 2014, claimed that most of the cultural content that we consume in the future will be created using some substantial help from AIs.
Like with any industry-defying comment such as Chollet’s, it was met with skepticism and brandished as near ridiculous. Not a year later, in 2015, the artistic capabilities of AI were coming into realisation with Google’s DeepDream algorithm turning images into visual artefacts induced in humans by disruption of the visual cortex via psychedelics. Interestingly, this process can be extended to speech as well as music. Music generated by neural networks may also play a large role in future consumption of these domains.
Following on from DeepDream, neural style transfer was also developed which allowed for cool image alterations based on the underlying style and content of images.
DeepDream
DeepDream uses convolutional neural networks (CNN) to modify images. The process is very similar to convnet filter-visualisation technique. DeepDream involves running a convnet in reverse. This is essentially running gradient ascent on the input to the convnet so that one can maximise the activation of a specific filter in an upper layer of the convnet.
However, there are a few distinctions that DeepDream implements on top of this idea:
- Maximising activation of entire layers, rather than a specific filter (with this, we have a broader range of features which are emulated)
- The process begins with a more complete image, rather than blank noisy input
- The input image is processed at varying scale (or octaves), which improves the quality of visualisations
Creating a DeepDream Image
In this blog, I will be using Keras and a pre-trained convnet called Inception V3 to produce some cool-looking visualisations. Code is largely taken from François Chollet’s seminal book on Deep Learning (Deep Learning with Python)
- Importing the necessary libraries and instantiating the model
from keras.applications import inception_v3
from keras import backend as KK.set_learning_phase(0)
#this line ensures that we aren't training a modelmodel = inception_v3.InceptionV3(weights = 'imagenet',
include_top = False
)
2. Creating a loss function
- We maximise this quantity during the gradient-ascent process
- Maximise activation of all filters in a number of layers, more specifically using the weighted sum of the L2 normalisation of the activations of a set of high-level layers
- Create a dictionary that maps layer names to layer instances
- Loss is defined by adding layer contributions to the scalar variable loss
- Add L2 normalisation of features
# starting with a somewhat arbitrary configuration involving four layerslayer_contributions = {'mixed2':0.2, 'mixed3':3.
'mixed4:2., 'mixed5':1.5
}layer_dict = dict([(layer.name, layer) for layer in model.layers])loss = K.variable(0.)
for layer_name in layer_contributions:
coeff = layer_contributions[layer_name]
activation = layer_dict[layer_name].output
scaling = K.prod(K.cast(K.shape(activation), 'float32'))
loss += coeff * K.sum(K.square(activation[:, 2: -2, -2, :])) /
scaling
3. Gradient-ascent Process
- Create a tensor to hold the generated image, and compute the gradients of dreams with regard to loss (as well as normalisation)
- Setting up a Keras function to retrieve the value of loss and gradient
- With these values, apply gradient descent
dream = model.input
grads = K.gradients(loss, dream)[0]
grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_valuesdef gradient_ascent(x, iterations, step, max_loss=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
break
x += step * grad_values
return x
4. Running Gradient Descent over Different Successive Scales
- Creating a step size, number of scales (octaves) at which to run gradient ascent, choosing an octave scale and iterations
- Setting maximum loss at 10, which will interrupt gradient ascent if reached
- Using some auxiliary functions
import numpy as npstep = 0.01
num_octave = 3
octave_scale = 1.4
iterations = 20max_loss = 10.base_image_path = '/Users/danielbaumann/image.png
img = preprocess_image(base_image_path)original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range (1, num_octave):
shape = tuple([int(dim/(octave_scale**i)) for dim in
original_shape
])
successive_shapes.append(shape)successive_shapes = successive_shapes[::-1]original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])for shape in successive_shapes:
img = resize_img(img, shape)
img = gradient_ascent(img, iterations=iterations
step=step, max_loss=max_loss
)
upscaled = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname='dream_at_scale_'+str(shape)+'.png'save_img(img, fname='final_dream.png')
And the result:
Neural Style Transfer
In addition to DeepMind, neural style transfer (introduced by Leon Gatys et al.) allows us to generate images using the distinction of style and content.
We typically think of STYLE as textures, colours and visual patterns
Whereas, CONTENT , means a higher-level overall structure to the image
The famous on the right can be broken down into these two main principles:
- The brushstrokes and colour represent the style
- The setting of the man on a bridge with a darkened sunset is the content
The landscape of London can also be broken down into style and content with the general setting describing the content.
The aim of neural style transfer is the following:
Preserve original content, while adopting the style of a reference image
We could define the loss as the following:
loss = distance(style(reference_image) - style(generated_image) +
distance(content(original_image) - content(generated_image))
Content Loss and Style Loss
Higher layers of a convnets contain increasingly global and abstract information. So we expect the content of an image to be captured by these representations
Style loss uses multiple layers of a convnet, so we capture the appearance of the style-reference image at multiple spatial scales extracted by the convnet
How Neural Style Transfer Works
Gatys et al. adopts a pretrained convnet called VGG19 which is saved in the Keras library. Once this is imported the process is fairly simple.
- Set up a network that computes VGG19 layer activations for the style-reference image as well as the generated image at the same time
- Use layer activations computed over these three images to define the loss function, which we want minimise
- Run a gradient-descent process to minimise the loss function
Summing Up
Deep learning and its methodology is not limited to what many would perceive to be arduous tasks. It also has vast capabilities that extend into the creative domains.
However the process remains largely the same. A well-defined loss function will allow you to produce images such as these with minimal computing power.
References
Google DeepDream, 2015 (https://en.wikipedia.org/wiki/DeepDream)
François Chollet, Deep Learning with Python, 2017 (https://www.manning.com/books/deep-learning-with-python)