Neural nets, graphic design and death metal
On Wednesday 28 March 2017, I gave a talk to the Singapore Creative Code meetup about my experiments with neural nets, graphic design and death metal. Total hero Andreas Schlegel was also talking, and you can find his slides here. Below is a write-up of my talk. You can see more experiments on my instagram, or follow me on twitter.
Currently I’m trying to develop some basic knowledge around how to work with neural nets and machine learning so that I might be able to undertake some deeper form of enquiry later on, or use the techniques in projects. And the best way to do that seemed to be just get my hands dirty and dive in.
Many of these explorations are totally open-ended, and not really practical. But there are a lot of sideline things that I’ve learned along the way that have been so useful. I’ll try and share these things throughout.
I’ll give you a short run down of how I got into it, then some early explorations, then getting to the training and generating of death metal logos.
To give a bit of context, my background is in graphic design. I’m really a visual person, which is what I do for my job. I’m definitely not a skilled technician when it comes to code, but I’ve always had a keen interest in technology and I’m always looking for ways to mix the two. My interest in technology isn’t just from a visual standpoint, although that’s definitely part of it and I’m enjoying the unpredictable output as a new source of visual language; but also from a critical perspective: how much might these new technologies change my role as a designer? What about my process? What might my purpose be in the future? I wrote about a few of these issues a while ago for AIGA, and that spurred me to want to explore further.
My interest in exploring automation and design properly started when I read this article by a designer called John Gold. It’s called ‘Taking the Robots to Design School’.
In it he outlines his own work in which he undertakes some analysis of commonly used typeface pairs, and algorithmically generates ideas for new typeface pairings that he may not have considered before. It’s a good illustration of how something that seems so big and scary can be applied to very tight problems and be genuinely helpful.
Later I found an article by Kyle McDonald called ‘A return to machine learning’. By this point I’d been looking at Andrew Ng’s Coursera course and getting extremely freaked out, so to find this beautifully written and illustrated write up was invaluable. It made machine learning suddenly seem accessible to me, a halfwit designer who could barley use the terminal, let alone train NNs.
There’s a bunch of extremely helpful links. I basically went through the article methodically exploring each and every one. There’s all the stuff that you’ve all probably heard of before like deep-dream and neural style transfer, but at this point it all seemed waaaaay above my head.
One of the links introduces you to Andrej Karpathy’s Char-RNN and Justin Johnson’s Torch-RNN. Both are recurrent neural networks that can take a corpus of data as an input, and over the training period will begin to learn the linear structure of the data—in this case text.
By now, I didn’t know what a recurrent neural network is. And really I still only have a vague understanding at best. But it just sounded like magic, and I got excited.
On the Torch-RNN Github there was a link to an amazing tutorial by a guy called Jeff Thompson spoon-feeding you how to set everything up—Torch, CUDA, all that crap. So I did. And miraculously it worked. I can’t tell you how helpful that tutorial was. Up until now I’d been trying desperately to get Tensorflow running, reading tutorials about Theano and Caffe and just having no luck at all. To this day I am still using Torch, for the simple reason that it’s the only thing I could get up and running.
Above is some sample text after being trained on the complete works of Shakespeare by Mr Thompson. And it’s addictive, training for an hour or so, just typing out commands and getting text back that sounds like the main-man wrote it.
However, training on Shakespeare is, well, a bit boring. I wanted something more … puerile?
So I found a website with some terrrrrrible jokes, and trained it on those .
Sooooo. I didn’t train it for long, and it generated the above. But I was amazed that although it’s nonsensical, there was some structure there. The call and response, even some proper words. It’s easy to see why this technique is well demonstrated by Shakespeare, given that the above sounds like some medieval standup routine. I’ve since found some killer examples of other generated text over here.
On my way to the death metal logos I explored a few other nets that are work mentioning.
Firstly Darknet by PJ Reddie, which is a great object classifier that you can either run through your terminal as a C++ thing (which is how I created the above), or now there’s OFXDarknet that you can work with in OpenFrameworks. It classifies still images or videos, and you can use pre-trained models on a number of different datasets. In the video version above I used a pre-trained model but linked to the wrong set of object labels, which is obviously stupid. But I found it quite unexpected and interesting.
I followed that mistake a little bit and sent Darknet for a psychology test. This was made using the OFXDarknet addon for OpenFrameworks. Darknet likes cake.
You train Pix2Pix on pairs of images like the one on the left. One image acts as a guide, and the other acts as a target image. The net begins to recognise the relationships between pairs, and eventually you can provide it a guide image and it will attempt to generate a target.
As with Darknet there are some pre-trained models that can be downloaded and played with.
One in particular that took my fancy was the edges2handbags dataset. This dataset is created from a ton of images of handbags from Amazon, and trained using an algorithmically created outline as an image pair. The output examples are quite astounding.
So, here’s me as handbags. I love the painterly feel of this, although it’s also quite sicky.
There are some peripheral things to mention here: the program is too large to run on my computer’s GPU. It just can’t take it, so I had to learn how to set up an Amazon GPU instance, which is not as easy as it could be. I’ve been using this one as it has the CUDA toolkit already set up and ready to go. Also, working with ImageMagick and ffmpeg made these videos much easier, and until undertaking this project I had never heard of either of them. I’m still using ImageMagick to preprocess datasets for training. It’s excellent.
I continued exploring Pix2Pix for a while, mainly because there’s a lot of room for exploration and experimentation. There’s a tight set of criteria to stick to, but that’s what makes it so usable. You can have endless fun with it. The above is a video that’s created from a dataset trained on a maps2satellite image pair taken from google maps. I created a screen recording of me zooming in to my hometown (shoutout Slough!), then used ffmpeg to split that up into individual frames. I then batch-processed the images through Pix2Pix and stitched the outputs into the video, again using ffmpeg.
And of course as a designer, one can’t help but fuck about with typography. So here’s an output created by feeding letter-shaped guide images into the facades dataset. I’d like to try to find a use for these insane letterforms.
Then I found the DCGAN. This is the one that I still love playing with. You train DCGAN on a database of images then ask it to create new ones from the model. It’s an adversarial network, which means that you actually train two nets that kind-of duke-it-out during the training process and try to slip each other up. You can read about it on the Github and associated paper.
One of the models you can download has been pre-trained on 10000 images of celebrities. Which is fun. The great thing is that you can generate the new images in multiple ways. You can generate single images like the one above.
Or if you’re short of friends you can generate a tapestry of new ones, like this. Silk scarf, anyone?
My favourite way to work with it is to have the net create a series of images that ‘step’ between two generated outputs, then stitch them into videos like the above.
By now I was getting more comfortable with the way it all works and I was about ready to try and train my own network.
I’m a designer, and I’m also lazy. I found this dataset of logos. Really horrific logos. It’s clipart really. But I wanted to see what happened if I trained DCGAN on these and tried to get it to output something.
I trained overnight, it did (I think) around 2000 iterations in 7 hours. The great thing is that you can watch the output as it’s training, to check that everything is running smoothly. On the left you can see some of the outputs while training. Notice that it’s already recognising some visual patterns, like circular shapes in the centre of the frame, or bits of typography around the edges of the logos.
OK, so it’s not 100% amazing, but this blew me away. It’s trying to put text in the centre of circles, and has a reasonable idea of form. There are arcs and swooshes, roundels and triangles. I was amazed by this.
I started thinking again about Shakespeare, and how my RNN jokes had sounded like a fat monk from 1523 trying to be a comedian. I realised that the Shakespeare dataset seems to be reasonably sympathetic to the mistakes in generated output—even stuff that doesn’t go in as Shakespeare comes out sounding Ye Olde Worlde, if you train it badly enough (which I did). So I started thinking about similar datasets I could use with DCGAN …
I don’t know if you’re aware of death metal logos, but they take as a point-of-pride their illegibility. Here’s some nice ones. I scraped 500 of these from the bottom of the Pinterest barrel. There’s something interesting in having an artificial network generate death metal logos. It’s probably one of the more esoteric types of music around, which in my mind makes it distinctively human—you have to appreciate it not only from an aural perspective, but also from a cultural one. Machines obviously can’t do this, but still it seemed like one of the more suitable kinds of visual to attempt imitating. Perhaps it’s because of the levels of complexity that we perceive neural nets as having: we can’t easily generate these logos procedurally, so maybe a more ‘organic’ method like a neural net is more suitable?
Midway through the training and our friend DCGAN is already starting to get familiar with the forms of the logos. I figured that if I just slammed it through thousands more iterations then something impressive would come out of it.
I was wrong.
After about 2000 iterations, everything got very repetitive. My dataset is relatively tiny, and I’m guessing that the net is learning patterns that are frequently repeated and just outputting those. I found that going back to a checkpoint at around 1400 iterations was giving me the best output. You can see on the right here some very lo-res logos. DCGAN is set up to train images 64px square. You can make adjustments to the code, and get this up to 128 or 256 but encounter problems training. This was solved by adding in a white-noise layer to the architecture of the net. There’s a good thread about it here.
I think the results speak for themselves really. On the left you have two of the better logos I was able to generate, vs two fantastic death-metal logos on the right. No-one is going out of business soon, especially not Christophe Szpajdel.
My next step is to look into audioGANs and see if I can generate some death metal music. Then I’ll have a metal night. And you’re all invited.