Quo Somniatis? DeepDream and Beyond

If you don’t live under a rock you should have noticed by now that Google’s DeepDream algorithm is currently the sweetest eye-candy machine since Kai’s Power Tools. It makes puppies and slugs and its look is so recognizable that it has “OH IT’S JUST A FAD” in big black letters written all over it. So lets move on, in 4 weeks everyone will hate it and only a few obstinate freaks will still use it on their MySpace page.

Or maybe not.

The abundance of rainbow-colored puppyslugs in the currently visible DeepDream’s output stream can easily lead one to the conclusion that there is not much real world use in this tool, except for people who require a constant influx of psychedelic visuals. But DeepDream is more than that. It’s like a record player (sorry I’m old-school, I should say “it’s like iTunes”) — unfortunately right now only one album is in the stores and it’s called “bvlc_googlenet” and yes, that one has an obscure mixture of cute pets, reptiles and molluscs on it. Actually there are a few more independent albums available, but they have all been recorded by science nerds which means they are about buildings or flowers, they are only available on the black market known as the “ModelZoo” and you will have to go through several hoops before you can even play them, so they are rather for the initiated connoisseur.

What we need to get out of puppyslug-valley is more albums to play. In practice this means that new models have to be trained containing material that is of a less creepy and more of a useful nature. Imagine a network trained on body-parts, skin textures or hair — with the right tuning this could allow for very subtle photo retouching and beauty- or uglifying of portraits. A model full of textures or typography could give flat graphic design an organic look. Comicbook style? Machine parts? Entirely abstract patterns? Any new model can produce entirely different results to the ones we see so far. The problem is that training a new model is a tedious process: you first have to manually collect a lot of different example images of the same category, then you have to train it on a powerful machine for several days or even weeks. But I am sure a few people are already working on new models right now, so in a few weeks we will hopefully see new styles.

It would be nice if we could speed up this process by turning it into a community effort. How does the idea of a Kickstarter campaign sound which would fund a site that allows to collect and sort image categories as well as pay for time on a powerful GPU cloud-server that is able to learn new models? Of course every backer will be given their own personal object category as well. Just throwing this out there — maybe someone better organized than me can help me pull it off.

But the DeepDream doesn’t end here. An even better metaphor than the record player is to compare the algorithm with a musical instrument. An instrument built by aliens which is so complex that until now we only figured out where to blow into and managed to get a single note of questionable quality out of it. With practice and hard work we will get better and figure out the intricacies of this device, we will find ways how to improve it and make it easier to play. For example I’m currently experimenting with changes to the code hoping to get more control over what the algorithm is dreaming about. But it’s more like a brain operation on a conscious patient, you do try and error on a complex net of neurons: wiggle on this nerve — ah yes the leg twitches but also the left eye blinks. Hours of fun.

What feels like peak dream is just the beginning. Right now the process to get it running on your own machine is not for the faint-hearted. But no worries — very soon the first easy-to-use apps will hit the market and a bit later the principle will find its way into popular photo editors or drawing tools. And the DeepDream does not end with static images. Already people have applied it to video — still a bit jumpy and experimental but it will improve for sure. The same core algorithm will also get applied to sound and the generation of 3D-objects and entire worlds. Deep dreaming MineCraft is absolutely possible given enough computer power to learn voxel instead of pixels. DeepDream opens the door to a new world of AI augmented creativity and it’s bound to stay.

So better get used to it.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.