Machine learning and creativity

Using Machine Learning to Augment Artistic Creativity

Guy Lukes
Guy Lukes
Oct 2 · 16 min read

Creating a space of artistic possibilities that support the creative process.

There has been a long running debate about whether machines can be truly creative. While this debate probably doesn’t have an end, there is no doubt that artists make use of technology, whether through better brushes for painting, or 3D animation software. Recently there has been growing interest in how machine learning in general, and neural networks in particular could be used to generate art.

In this article we will explore the role of the artist in machine learning approaches and present a new “Generative Relational Context “(GRC) neural network model. The purpose of this model is to augment artistic expression and more directly support the role of the artist in the creative process. Instead of relying on a black box model and vast quantities of training data, the GNC will start with a set of artist created or identified samples that provide the starting point for training a generative model to create a space of artistic possibilities that can be explored under artistic control. This approach can be thought of as a kind of gamification of art through the combination of machine learning and artistic sampling and exploration redefining the role of the artist in machine learning approaches to art. A simple “low poly” 3D model of an opening flower petal will be used as an example.

How machine learning has been used to generate art

Initial attempts at generating art with machine learning have taken several different forms. One obvious approach is to automate the generation of art by training a network using existing works of art as input. The neural network can then be used to generate new outputs different than the training data. By far the most popular techniques using this approach are variations on the generative adversarial networks like GAN, CAN, DCGAN, etc. This can be done for literary texts, musical sounds, visual representations of images etc. Inputs for culinary recipes, perfumes or dance can also be imagined. The immediate challenges then become, how to represent the inputs as numbers, how to structure the algorithm and how to define an error signal and loss function the steers the training towards what would be recognized as legitimate works of art. As you can imagine, there have been many awkward as well as “interesting” initial attempts.

Some of the most publicized attempts have included:

  • Generating text from existing works of prose
  • Using a large collection of portraits to generate images of faces for people that do not exist
  • Generating music that mimics a composer’s style
  • Taking an existing painting or photograph and transforming it into a different visual style

There have also been several approaches that try and move beyond replicating existing artistic styles.

While some have focused their algorithms into replicating musical styles, others have been using the advantages of machine learning as a part of their composition process. A most notable example is Google’s Project Magenta, launched in 2016 with an aim to push the boundaries of “machine learning as a tool in the creative process”.

Andrew Gordon Creative Automation is the New Medium

Instead of starting with the technical issues of these approaches, I would first like to address the role of the artist and where creativity is recognized and assigned.

The role of the artist in machine learning

For an individual painter, who has mastered the use of paint and paint brushes, the source of creativity in artistic expression is unmistakable. However, with many artists increasingly using a team of assistants or fabricators, the source of creativity can become more ambiguous. For 3D animation a large team of many different specialists may be employed in a production. Usually the directing artist is given the creative credit, even if he was only tangentially involved in the production of the work. In the extreme, with “found object” art, the creativity is simply in the recognition of an object as art, or the choosing of what is art without any actual production of the work by the artist.

Looking at machine learning through the lens of automating the role of the art assistant or fabricator, creativity would seem to stay with the artist. However, many current approaches to art in machine learning tend to diminish the role of the artist. The allure of creating a machine that has true creativity is often considered the holy grail of AI in art. In a similar way to what Jaron Lanier described for machine learning approaches to language translation, machine learning in art usually uses “big data” to process thousands, millions or billions of input samples into an algorithm that uses some emergent process of interpolation and extrapolation to generate a result. This creates a kind of Star Trek Borg Cube that assimilates vast quantities of material and erases their source identity, transferring the assignment of creativity to an emergent black box. This can be seen as promoting a mechanical assistant until the “student becomes the master” and the artist is deprecated and replaced with an algorithm.

Supporting creativity

The problem with this vision, is that it still depends on real artists for its input, even if their identity and importance is obscured. The second problem is that there is no one to drive the process of creating value, which is always holistic and subjective in nature.

The distinction between first-order expression and derivative expression is lost on true believers in the hive. First-order expression is when someone presents a whole, a work that integrates its own worldview and aesthetic. It is something genuinely new in the world. Second-order expression is made of fragmentary reactions to first-order expression

Ric Amurrio quoting Jaron Lanier in The Future of Content

There is also another more technical problem with these kinds of algorithms, usually identified as the “The curse of dimensionality”. Organic form and much of artistic expression is characterized by a large number of features that are coordinated as a whole, and that all depend on each other in a complex web of many to many relationships. Changing one point requires the adjustment of a large part of the rest of the form in order to maintain a functional or artistic coherence. As a flower petal opens and expands, every point on its surface must track with every other point in its shape, to maintain a holistic pattern. To use the example of 3D CGI modeling, as the complexity of an object’s shape increases (requires more coordinates), the complexity of possible 3D coordinate combinations increases exponentially, and the percentage of those combinations that represent coherent shapes diminishes exponentially.

Even more importantly, trying to extrapolate or interpolate between these values will become exponentially useless.

In high dimensions, almost all of the data is in the outer shell. This means, among other things, that ‘neighborhoods’ typically stretch to the outer edges of multiple dimensions, that machine learning models almost always need to extrapolate, and that few other data points will be “similar” to any given data point

Aaron Lipeles The Curse of Dimensionality

The mathematical details of this are outside the scope of this article, but the ability of machine learning to scale to increasing number of features is often overcome by starting with problems where an unlimited amount of data can be simulated, for example in games, or by using relatively simple examples, with ambiguous “artistic” results. The approach in this article will try and show how to overcome these problems by reinstating an actual artist as the coherent source of input, and then have an algorithm support the development of a space of new creative possibilities that is dense enough to be navigated using an artist’s subjective judgments, without getting lost in a vast, nearly empty, space of possibilities.

Machine learning as artistic augmentation

To address these problems a Relational Context Model shown in Figure 1. is presented. While most current generative approaches use neural networks as a black box to distill vast quantities of sample data without the guidance of an artist, GRC will instead try and more directly support and augment the creative expression of an active artist. There will be three steps to this process. First reference content is created or identified by an artist, and captured in neural networks referred to as Template Models. Then two or more Template Models are used to train a Relational/Generative Model. This generative workflow has two parts, an encoding “Relational Model” and a decoding “Generative Model”.

Artist generated templates

To illustrate the process, 3D coordinates for a computer graphics imagery (CGI) mesh will be used, similar in shape to the opening of a flower petal. Each petal represents a point in time and consist of a 7 by 7 mesh of 3D (x,y,z) coordinates with a separate mesh/petal for each of the 7 indexed points of time. These time indexes are like movie frames that progress from a common closed bloom that is differentiated by the templates into two different fully opened shapes. Shape coordinates are formatted for the neural network as a rank 3 tensor represented as a 3-dimensional array with a shape of (7,7,21). The first dimension represents 7 time indexes (each a petal at a moment in time), made up of 7 morphological steps, where each step represents one row of coordinates in the grid of a mesh. The 21 coordinates of that morphological step are arranged as 7 X-coordinates followed by 7 Y-coordinates followed by 7 Z-coordinates.

Figure 2 shows a 2D top view of the 3D coordinates for one of the petal objects (at one time index), with the actual data listed after the figure. Figure 3 shows the rendered shapes for 5 timesteps of each template, omitting the first and last offset timesteps. The offsets are used to model the curves beyond the modeled range in order to remove edge effects in learning.

tensor( [
[-0. , 0. , -0. , 0. , -0. , -0. , -0. , -0.03 , -0.017, -0.008, 0. , 0.009, 0.018, 0.03 , -0. , 0. , -0. , 0. , 0. , 0. , 0. ],
[ 0.179, 0.171, 0.163, 0.161, 0.163, 0.171, 0.178, -0.072, -0.043, -0.019, 0.001, 0.021, 0.044, 0.071, 0.011, -0. ,-0.012, -0.018, -0.012, -0.002, 0.012],
[ 0.36 , 0.353, 0.342, 0.34 , 0.343, 0.353, 0.361, -0.126, -0.081, -0.03 , -0. , 0.032, 0.081, 0.112, 0.029, 0.008, -0.019, -0.028, -0.02 , 0.009, 0.03 ],
[ 0.554, 0.552, 0.546, 0.54 , 0.545, 0.551, 0.553, -0.148, -0.101, -0.044, -0. , 0.04 , 0.095, 0.139, 0.04 , -0.002, -0.03 , -0.042, -0.028, 0.003, 0.034],
[ 0.716, 0.736, 0.754, 0.759, 0.754, 0.737, 0.716, -0.152, -0.101, -0.049, 0. , 0.048, 0.101, 0.154, 0.028, -0.008, -0.038, -0.052, -0.038, -0.007, 0.028],
[ 0.889, 0.916, 0.934, 0.941, 0.935, 0.917, 0.888, -0.133, -0.098, -0.05 , 0. , 0.048, 0.097, 0.135, 0.009, -0.022, -0.041, -0.056, -0.042, -0.022, 0.01 ],
[ 1.019, 1.047, 1.069, 1.078, 1.069, 1.046, 1.018, -0.109, -0.085, -0.042, -0. , 0.038, 0.085, 0.104, -0.018, -0.028,-0.036, -0.043, -0.036, -0.028, -0.022]])
Figure 3. Shows the rendered 3D mesh for two template shapes with SubSurf smoothing.

Building the Template Models

Once the output shapes have been identified for the Template Models, the shapes are paired with elements of an input Relational Map. This Relational Map is designed to index a set of properties that unify the changes across the objects in some ordered way. In the current example, the input to the Template Model is a 2D relational map consisting of two index values ranging from 0 to 1 (with an offset to 1.1). These values create a rank 2 Tensor of shape (7,2) represented as a matrix (2 dimensional array), with the first index representing a time step and the second index representing a morphology step. For the shape in Figure 2 the relational map has a time index=.8 and a range of morphological steps as shown below:

[0.8000, 0.0000],
[0.8000, 0.1000],
[0.8000, 0.3000],
[0.8000, 0.5500],
[0.8000, 0.8000],
[0.8000, 1.0000],
[0.8000, 1.1000]])

Each Template Model has 4 fully connected layers with shape [2, 10, 15, 21] that connect 2 values of a row in the input Relational Map to 21 values of a row of the output shape. This provides a mapping between the temporal/relational structures of an object that is independent of it shape, and the actual coordinates of the objects in physical space. This separation between the relational constraints that give organic form its holistic character and the rendering those relationships into a specific form, will be critical to the generative process. Once these template models have been trained using standard back-propagation techniques, they can be used to generate an unlimited amount of continuous valued training data for our Relational and Generative models.

Building the Relational and Generative Models

The second step of the process, trains the Relational Model using training data from multiple Template Models to create an inverse transform of that data. This inverse transform maps 3D coordinate data from the different template models to their shared Relational Map. These relational map coordinates are then transformed back into 3D phenomenal space coordinates by a Generative model.

The problem with this kind of approach, is that it can loose information when patterns in the templates overlap. All the templates use the same Relational Map, making its inverse into a single Relational Model ambiguous. This problem is usually solved by “tagging” or one-hot-coding the data with an index to disambiguate the source data. Otherwise the Generative Model will average together the different output values for the same relational indexes.

However, instead of explicitly setting a template index as a target for the Relational Model output, the Relational Model has an additional context output, whose value will emerge from the training process. The Relational Map and Emergent Context value output by the Relational model, can then feed forward the information the Generative model needs to recover the 3D shapes of the template models. During training, the backpropagation of error moves back through the weights of the Generative model and then back through the weights in the Relational model, as shown in Figure 4. There is also a second flow of error from the Relational Map targets back through the Relational Model. This will allow the Relational model to differentiate the Emergent Context values necessary to make the relational to phenomenal transform reversible (information preserving) back to the original input shape.

If templates share the same 3D shape, for time and morphological indexes, then there is nothing to distinguish, and the context values will be the same, producing the same shape, from the same relational constraints. As the 3D shapes diverge, the templates “trace” in context space will also diverge along different manifolds. Context points between the manifolds will interpolate between the different template shapes for a given relational map. If you project from the context average outward beyond the template context trace boundaries, you would expect to see the emergence of new shapes, diverging from the relational constraints of the templates.

Using context values to explore the generation of new shapes

To see how the context values change across the relational map, we can input the reference 3D shape values used to train the templates, back into the Relational Model and observe the resulting context outputs.

Figure 5 Context changes over morphology steps for different template and time indexes

Figure 5 shows how the context values change across morphological steps for lines that represent different time indexes in the Template Models. For example, at time index .1, both templates model a closed petal with the same 3D mesh. Since the mesh curves are the same, the context values are the same. As the petals expand over time, the meshes diverge and the context values separate. The space between these boundary curves, generated from the Template models, provides a compact space of coherent configurations that can be explored along the dimensions of the Relational Model. This provides a model for shape generation that can be used to interpolate and extrapolate based on time and morphology dimensions of a “design space” and then projected into a generated object in 3D output space. This compact template defined region, can then be incrementally explored through affine transformations in design space that interpolate between templates or move outside template boundaries to generate new possibilities.

Instead of exploring in the space of phenomenal shapes, which quickly becomes computationally intractable, the relational morphological/time map that was created with the templates, provides a completely compact set of indices. Every point in the space represents a valid morphological stage at some point in time.

Figure 6. Context interpolation/extrapolation of morphology for time index .7

Figure 6 shows the context curves for the reference template shapes at time index 0.7 as dotted lines. Context values between the two reference curves (dotted lines) will interpolate between those two reference shapes when projected into a 3D space.

Figure 7. Top view 3D rendering of Generative Model’s interpolation between reference context values
Figure 8. 3D rendering of Generative Model outputs using an exploratory context path

The context lines outside the reference curves extrapolate beyond the constraints in the Relational Map. This in turn allows the Generative Model to produce shapes beyond what it was trained to produce, while enforcing the relational constraints, to a diminishing degree. Figure 8 shows one path of context values that translate along the normal vector outside the second Template Model’s context values.

Summary and future research

To Summarize, a process has been described for how an artist can start with a set of template “sketches”, that set the boundary for a set of artistic possibilities, and then tie those templates to a relational map that determines how they are connected as an integrated whole. These artifacts can then be used to create a Generative (GRC) Neural Network model that can create interpolations between the boundaries defined by the templates and relationships in a Relational Model. It also proves a structured way of exploring outside those boundaries. A simple example is given of a CGI model of a flower petal, which demonstrates the interpolation between two templates using a context parameter that is emergent from the neural network training process. In the future more complex models and design space transforms will be needed, in order to determine the true creative potential of this simple demonstration of technique. Applications and tools will also ultimately be needed to gamify the technical details of the algorithms for use by non-technical users.

Another avenue of future research might be to use the stability properties of this more compact emergent representation of state, as a basis for the accumulation of value estimates using reinforcement learning. The suggested metaphor, is of an artist disrupting his work in impulsive ways and then bringing the work back into focus at greater levels of artistic complexity. Another metaphor might be an ecosystem disrupted by a series of disasters that continually reform at greater levels of complexity, seemingly against the forces of entropy.

Other applications of using affine transforms in a semi-structured latent space, could extend outside the domain of art and creativity. One possibility is in explainable machine learning, where for example, a set of symptoms map to a structured set of diagnostic possibilities, who’s diagnostic context can be used to regenerate the expected symptoms for that diagnosis and context. The distance between the generated and source symptoms could then be used to determine the training density of that part of the diagnostic space. Admittedly, at this point, this is just speculation, but it suggests interesting possibilities.

Example code

In conclusion I would like to highlight some aspects of the code used in this article, which is based on the PyTorch open source machine learning library. While a complete review of the code is outside the scope of this article, I want to highlight some of the more unique features of the model. The full code will be available on a GitHub repository to support a future more technical article.

First, I want to show the network model that was used. One notable feature is the use of the ELU activation function that seems to work better for small problems like this, where large scale GRU performance isn’t an issue. For this problem the CPU actually ran faster than using the GPU. There are also a couple of extra parameters in the constructor to add size and descriptive information.

A simple PyTorch Model structure:

class Net(nn.Module):
def __init__(self, layerSizes, name):
super(Net, self).__init__()
self.shape = layerSizes = name
self.seq = nn.Sequential(
def forward(self, x):
return self.seq(x)

Model initialization and parameters:

tmModel0 = Net([2,10,15,21],’First template model’)
tmModel1 = Net([2,10,15,21],’Second template model’)
relModel = Net([21,15,10,3],’Relational Model with 1 Context output’)
genModel = Net([3,10,15,21],’Generative Model with 1 Context input’)
parms={‘epochs’:100000,’lrR’:.0001,’lrG’:.0002,’numSamples’:1, ‘rFilter’:torch.Tensor([1,1,.05]), ‘tModels’:[tmModel0,tmModel1], ’device’:device}

Training loop

For the training loop there is a lot more going on. The function gy.createRcGenTargetsFromModelList creates targets for the Relational Map and 3D coordinate target values for the Generative Model that are sampled from randomly weighted values of the two Template Models. A more notable issue that was not discussed in the article, is how gradient information is passed back through the Generative and Relational Models. In PyTorch this requires that you use “retain_graph=True” when moving back through the Generative model. You also want the Generative model gradient to influence the context value without disrupting the relational map of the Relational Model targets. Trying to do this in a high-level framework like PyTorch turned out to be challenging. After many hours trying to develop a custom autograd function, I hit on a much simpler solution. I simply mask the context value passed into the loss criterion of the Relational Model, so that its value is determined by the error gradient from the Generative model instead of direct feedback from the Relational Model Target. A small .05 amount of the gradient was allowed to pass through the mask in order to help scale the context value.

criterion = torch.nn.MSELoss(reduction=’sum’)
relOptimizer = torch.optim.Adam(relModel.parameters(), lr=lrR)
genOptimizer = torch.optim.Adam(genModel.parameters(), lr=lrG)
for epoch in range(epochs):
#Each epoch gets a new training set based on the template models
relTarget, genTarget, _ =
gy.createRcGenTargetsFromModelList(tModels, numSamples,
relOutput = relModel(genTarget)
genOutput = genModel(relOutput)
lossG = criterion(genOutput, genTarget)
lossR = criterion(relOutput*rFilter, relTarget*rFilter)


Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade