Inside Deep Dreams: How Google Made Its Computers Go Crazy
Why the neural net project creating wild visions has meaning for art, science, philosophy — and our view of reality
I gripped the desk and sagged toward her as she held out the envelope, but I refused to accept it. The Woman’s face was changing: swelling, pulsing… horrible green jowls and fangs jutting out, the face of a Moray Eel! …Terrible things were happening all around us. Right next to me a huge reptile was gnawing on a woman’s neck, the carpet was a blood-soaked sponge — impossible to walk on it, no footing at all. “Order some golf shoes,” I whispered. “Otherwise, we’ll never get out of this place alive. You notice these lizards don’t have any trouble moving around in this muck — that’s because they have claws on their feet.”
“Lizards?” he said. “If you think we’re in trouble now, wait till you see what’s happening in the elevators.”
— Hunter S. Thompson, Fear and Loathing in Las Vegas
In the very early hours of May 18, 2015, Alexander Mordvintsev was wrenched from sleep. A nightmare, he later described it to me, in the first interview he has granted on the experience. Or, at least, a dream, a deeply disturbing dream, where an intruder had crossed the threshold of the Zurich apartment that he, his pregnant wife, and his 3-year-old son had been occupying for the past few months. They had moved to Switzerland from St. Petersburg that last November, when the Russian computer scientist got a job at Google’s engineering center there.
Now it was darkest night and Mordvintsev, jarred awake by his savage slumber, leapt from the bed to check the door. It was closed; all was quiet. But his mind was afire. Okay, it’s 2 a.m., but I can’t sleep, he told himself. So time to write a few lines of code.
It would be a decision that would eventually unleash a torrent of fantastic images, torn from an alien perspective, that intrigued and twisted the minds of those who viewed them. A decision that would reveal the power of artificial neural nets, our potential future overlords in an increasingly one-sided relationship with machine intelligence. And a decision that would change Mordvintsev’s own life.
He turned to a project he’d been working on since the beginning of the year. Mordvintsev had been fascinated with neural nets (NN), a computer analogy to the barely understood thicket of connections in our own brains. Sophisticated artificial neural nets now power “deep learning,” the hottest and most promising development in artificial intelligence. He was tinkering with his own vision-recognition neural net, developed with open-source tools. A number of these had appeared over the last few years, part of a boom in the field after these systems had proved effective in computer vision and other functions that had previously been elusive.
None of this work, it happens, had anything to do with Mordvintsev’s official duties. Google is a leader in NNs, with assets that include pioneering researcher Geoffrey Hinton; Jeff Dean, a legendary Google computer scientist who leads a team that built an NN informally dubbed the Google Brain in Mountain View; and Google’s DeepMind acquisition in London pushing the boundary of machine intelligence. Mordvintsev had no formal connection to them; he worked in Safe Search, which prevents spam and porn from infecting search results. But Google still tolerates its engineers using part of their time to work on passion projects. For Mordvintsev, this now meant neural nets and vision systems. It was an extension of his previous interest in using computers to model biological systems. Years earlier, he had tinkered in simulations of coral reefs.
As an NN newbie, Mordvintsev was teaching himself about the field, absorbing key papers and playing with systems already trained to recognize certain objects. His curiosity was piqued by one the abiding mysteries of neural nets and deep learning: why did they work so well and what the hell went on inside them? Others had been asking the same question, using what are known as convolutional neural nets (ConvNets) to probe vision recognition systems at various points in the process. ConvNets are a specialized form generally used for vision recognition; they take the biological metaphor farther by not only using a neuron-style learning system, but by employing the neurons in a similar fashion to the way light receptors are arranged in the visual cortex. One team in particular, from the Visual Geometry Group at the University of Oxford, had taken an interesting approach to analyzing how successful vision systems can recognize (classify) objects: at a certain point in the training process, they got the networks to generate images of what they were perceiving. By looking at those images, the researchers had a better idea of what the neural network was up to at that instant.
Mordvintsev wanted to continue down that path, with a wicked turn: He was writing code to make a neural net create meaningful images that weren’t there at all, at least not as humans could tell — visions born of machines, oozing out of the metaphorically neural connections in the system. On this restless night in May, while his wife and child slept, he did the coding equivalent of fiddling the dials to change the objective of the neural net. Let’s find something that increases the magnitude of the activation vector, he told himself. Just like, whatever it sees in this batch of images, let’s have more of it.
In other words, he would flip the function of the neural net from recognizing what was there to generating stuff that might not be there. In the middle of the network’s usual practice of trying to verify a nascent sense that a particular pattern may be a target object, he told the network to skip directly to “Go,” and then start making the object. Previously, the mission of convolutional neural nets was to proceed in a defensive-driving fashion, straining to filter out wrong turns and make accurate predictions. Mordvintsev’s process was more Fast and Furious. It was like gunning the system forward, then suddenly slamming on the brakes and reversing. You could almost taste the pixels being spit out like greasy gravel when the wheels spun on digital asphalt, as the system seized on hints of objects and recklessly took license to flesh them out into vivid representations of a target image.
The trick was getting the system to do its thing — reversing itself and then reaching back into itself to find templates for new images — at just the right time and in just the right measures. “It’s easy to write the code and tricky to find the right parameters,” says Mordvintsev. The actual chunk of computer code that turns a neural net into something that churns out images from its hitherto secret life turns out to be only about thirty lines of code. But in this pass, Mordvintsev got the balance just right.
The results came instantly. The open-source tool he was using to build his neural net had been “trained” on a well-known dataset called ImageNet to recognize objects of 1,000 categories, including 118 dog breeds. He fed a photo into it: a beagle and kitten, each perched on tree stumps with a meadow in the background. (He found it on a digital wallpaper website.) Normally, one would use a vision-recognition neural net to identify what it saw. But Mordvintsev was hoping for something novel and unexpected. His code tapped the neurons mid-process, building the half-baked clues of dogness into more fully realized dogs. With repeated passes of modified images, he got a final output that was not at all normal.
The image is of a dog, in the broadest sense. This is startling to begin with, because the source of this image was not the beagle but the adorable little kitty cat. (Maybe not so surprising, given that the network was trained largely on dog breeds.) On the beast’s forehead was a second set of eyes. Bulging from his canine haunches was a separate snout with another pair of unsettlingly alert eyes. In fact, pieces of dog face popped up in all sorts of unexpected places. Overall, it appeared that some horrid infection was brewing underneath the animal’s fur, with teeming sets of snouts and eyes straining to burst through at the next instant. If you looked closely, a pair of eyes had even broken though in the pinkish lower jaw. For good measure, the background of the picture, some sort of green wall, displayed a complex tapestry of patterns, as if Aztecs had finger-painted the surface. In several places on the wall, it looked like spiders had randomly broken through, like arachnid bullet holes.
Not to put too fine a point on it, but the image looked like the work of a mad person. Or someone on LSD. But its origins, of course, were not psychiatric or psychotropic. They were algorithmic.
To that point, Mordvintsev had refrained from sharing any results. Earlier in the year, he had given a workplace talk about some of his theories, and gotten the interest of a few scientists in the global Google Research archipelago. But this time, he felt sufficiently confident to finally post some images, including the metastasized dog faces, on the internal version of Google Plus, accessible only to those working for the firm.
He posted it at 2:32 a.m. “Not sure if it was a good idea to try a DNN image enhancement at 2 a.m.,” he wrote. “How do I sleep now?”
Though it was the wee hours in Zurich, it was late afternoon in Mountain View. (The sun never sets on Google’s engineering centers.) Only seconds after Mordvintsev posted, the first of an avalanche of responses and +1’s appeared.
It read: “MY EYES! MY EYES!”
Mordvintsev’s post galvanized the Google community and received 162 +1’s and over 60 comments, an unusual number for a dispatch from a random engineer in the Safe Search team. Two engineers in particular were captivated.
One was an intern who had been working with the elite deep learning team led by Jeff Dean. Chris Olah, age 22 at the time, had come to his Google internship after a two-year “20 Under 20” fellowship funded by venture capitalist Peter Thiel, who pays bright youngsters $100,000 to forgo college and build stuff instead. In addition to his interests in 3D printing and the cult programming language Haskell, Olah was obsessed with neural nets, and had predictably been intrigued by Mordvintsev’s original Tech Talk and blown away by his post.
“I had been very interested in these convolutional neural nets that work with images, and how we understand what’s going on in them,” he says. After the Google Plus post, Olah got permission from his group leaders to collaborate on the project, and the work he had been doing in interpreting how NN’s visualize objects turned out to be invaluable in exploring Mordvintsev’s discovery.
Also drawn into Mordvintsev’s orbit was software engineer Mike Tyka. Originally a biochemist, he switched to computer science, concentrating on simulations of protein folding, and now was working in the machine learning group in Google’s Seattle office. He is also an active artist, creating sculptures in copper and glass inspired by the protein-folding work. He also was part of a team that built a 35-foot sculpture based on a Rubik’s Cube. Tyka, who had also recently become obsessed with neural nets, thought the Mordvintsev post seemed to speak to both art and science. “If you think about human creativity, some small component of that is the ability to take impressions and recombine them in interesting ways, unexpected ways,” he says. “It’s cool to see computers come up with unexpected things that humans didn’t come up with.”
Tyka began experimenting by feeding different images into the system, repeatedly reversing it the way Mordvintsev suggested. Tyka produced a gallery of images where the neural net seemingly transformed every pixel of what started as a random image. Mountains became pagodas. Leaves turned into birds. And even the backgrounds, grew intricate designs, as if the nets harbor demons schooled in the geometric architecture of ancient Islam.
Some of the craziest outputs came from the network’s interpretation of seemingly benign photos of a blue sky with clouds. With eerie similarity to the way children divine animals and structures in cloudscapes, the neural nets exposed magical scenes, not just from clouds but almost imperceptible perturbations in the seemingly pure sky. From patterns undetectable by humans, the net produced species unknown to any taxonomy. Chris Olah would name them: Pig-Snail, Camel-Fish, Dog-Bird.
Tyka then took things a step further: instead of starting with an existing image, he’d start the process with random noise and keep re-feeding the images until the system filled out the nascent patterns it recognized into actual objects and well-developed tapestries. “If you do that for a few rounds, you essentially lose memory of the initial photo [anyway],” he says. “If you start with random noise, your image is purely generated by what the neural network knows about.” Instead of following up on patterns that vaguely hinted at a target object and making that object appear, these neural nets were free-styling. It didn’t take long before Tyka was also getting amazing results from those initial submissions of noise: the neural nets built sparkling imaginary landscapes, as if Red Grooms and Robert Crumb were illustrating Frank Baum’s Oz classics.
We all know that artificial neural nets are computational and have no “minds.” Yet one could not help but intuit that the results were some kind of window into the neural net’s subconscious.
More prosaically, these outputs advanced Mordvintsev’s original quest to better understand how neural nets worked As he and his new collaborators generated more images and conducted more experiments, they began to get insights into how neural networks process our world. Chris Olah found something particularly interesting. When they commanded the network to produce images of barbells, they discovered something strange. Previously, researchers had assumed that when a neural net recognized a barbell, it was “seeing” the object as a human did — that discrete object made of metal. They were wrong. In the “mind” of a neural net, a barbell was a material object with a human hand and wrist attached. Image after image came with a fist and a wrist gripping the barbell. Insights like that could help in training future NNs.
As Mordvintsev’s work drew more discussion at Google, people in the research group felt that the project should be made public. Mordvintsev suggested to Olah, who was a prolific blogger in his own right, that they collaborate on a blog item for general distribution.
“Probably the proper way to release such things is to do an extensive set of experiments and write a paper and publish it in some conference,” says Mordvintsev. “But actually I thought that a blog post was better because it’s much faster and easier to get this stuff out.” Tyka joined in, at first helping with the text, and then producing a gallery of his work.
In their blog post, the team called their process Inceptionism, an homage to an early ConvNet paper from Google research that named a system after the internet meme emerging from the movie directed by Christopher Nolan. (Lead actor Leonardo DiCaprio: “We need to go deeper.”) They described how in works in, for instance, one of Tyka’s transformations of a picture of a cloud-pocked sky that hatches a spooky menagerie:
This creates a feedback loop: if a cloud looks a little bit like a bird, the network will make it look more like a bird. This in turn will make the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.
The trio published the item on Google’s public research blog on June 17. And the internet went nuts. Within a few days the images appeared in over 100 articles and countless tweets and Facebook posts. Machine learning Subreddits, blogs, and discussion boards analyzed every aspect of the post. For several years, the terms “neural nets” and “deep learning” had been bandied about to the general puzzlement of all but scientifically minded observers. Now there were pictures that, whether representative or not, provided a visual entry-point into those difficult concepts. It was as if artificial intelligence was painting its self-portrait.
But that was only the beginning. On July 1, Google released code on GitHub that allowed people to make their own images. In the interim, the Google researchers who had created the original Inception system suggested that the new systems — since it was a totally discrete effort — not confuse people by using the same term. So the effort was now dubbed Deep Dream, a portmanteau that evoked both deep learning of neural nets and the dreamy surrealism of the systems outputs.
Now the internet went really nuts.
Several apps sprang out of nowhere to allow even non-technical people to transform their loved ones into nightmare creations. Lots of people experimented with airy-fairy transformations like in Tyka’s cloud series, but it seemed the most popular thing to do was to use Deep Dream as Beelzebub’s paint kit. AI message forums, tweets hash-tagged #deepdream, and Pinterest boards instantly presented a Boschian bestiary of visions. A popular pastime was deep-dreaming the presidential candidates, mostly Trump, often making them appear as if ripped from stray pages in Ralph Steadman’s Fear and Loathing in Las Vegas notebook. The rock band Wilco produced a deep-dreamed print of the kitten that appeared on its latest album cover, selling it on its website for $20. And, naturally, some people gave porn images the deep-dream treatment, with results as blisteringly appalling as one might expect. (NSFW link only on request.) A Gizmodo headline summed it up: “Google’s Dream Robot is Running Wild Across the Internet.”
A thriving Deep Dream community emerged. One of its obsessives is Samim Winiger, a semi-famous Swiss game developer and former dance music producer who is known for adopting cutting-edge technology. “These are the first pop culture generative images since Fractals,” he says via Skype chat. Winiger’s own contribution was a program that can create animations with Deep Dream software, developed in collaboration with Roelof Pieters. They recently used it to produce a music video of the electropop group Years and Years.
“In five years, we won’t recognize Photoshop,” says Winiger. Instead, artists and illustrators will use “ a generative toolbox” to render images up to a superhuman higher fidelity. He calls it “creative AI.”
But Deep Dreams has a significance far beyond questions of art. To understand why Mordvintsev’s experiment matters in a broader sense, you have to know a little about neural nets and what’s happening with deep learning. First let’s describe neural nets. They consist of artificial neurons in stacked layers: “deeper” nets can have as many as 30 layers. In vision systems, researchers train NNs by feeding them images and grading the output. As those images pass through the net, each layer analyzes a bit more of it, putting together a holistic impression of what it sees. After multiple passes of images through the net, with feedback on the accuracy of its guesses, the net self-adjusts its parameters so that it can correctly classify various objects. Then the final layer — known as the “output” layer — can deftly determine objects it’s learned.
Due to improvements in the field in the last decades, neural nets have gone from a research backwater to the hottest area in artificial intelligence. “Deep learning” neural nets now routinely recognize images and interpret natural language so accurately that they are beginning to automate jobs that previously only humans could do.
But we also have a lot to learn about artificial neural nets. So far, the work has focused on results; what actually goes on when a NN begins its self-determined adjustment of weights and parameters has been kind of black box. Did it work or didn’t it?
So it’s hard to tell what’s going on inside an effective neural net, and even harder to understand in what ways they work like real brains and in what ways they do not. But now that we know they do work, we need to know how, so as to improve the next generation.
That’s the utility of the Deep Dreams process. For instance, in one kind of experiment the researchers would choose which layer of the net would be active to enhance patterns it detected in a random photograph. If they chose one of the lower layers — those making the system’s initial assumptions about what an image contains — they would get intricate patterns, because at that point the network is analyzing the edges of objects and not yet classifying them. Another type of experiment tapped the higher layers, encouraging the system to riff on what it had begun to recognize. That’s when the weird animals will appear. While the output is fascinating, we’ve learned more about the way NNs operate.
But Mordvintsev’s experiment is important in another way: as a pointer to the vast potential of neural nets. As these nets develop, they are destined to not only match human ability in some areas, but exceed it. Convolutional neural nets, for instance, seem to have the potential to be more perceptive in some ways than people. As the Deep Dreams experience indicates, neural nets can see things we don’t. We aren’t just talking about rabid hounds bursting through one’s neck, but otherwise undetectable phenomena of real value to us. For example, scientists are starting to use neural networks to detect cancer in ultrasound scans. They can also scan through data to make predictions of traffic patterns.
In the future, neural nets will be used to enhance and in some cases replace humans, whose limited bandwidth falls short of performing certain tasks. Consider the TSA agent who screens airline passengers. Besides eliminating human failings like fatigue or distraction, a neural net might evolve to recognize subtle patterns in objects (in luggage) and behavior (in passengers) that could match or exceed even the harrowing interrogations of an El Al airline agent.
Those are the utilitarian implications; there are also philosophical ones. Probing artificial neural networks is a unique way to sample an alternate means of perception. While ConvNets are engineered to mimic a biological process, we know what’s actually happening in these computational systems is quite different from our own wetware. So there’s a value in plumbing what is, essentially, an alternate means of perception. Take the earlier example that Chris Olah exposed with the NN regarding barbells as something with a human hand attached. Viewed in a certain light, this misunderstanding seems unremarkable. Of course, given a steady flow of images of weightlifters, a machine could come to believe that a human hand wrapped around a barbell might be part of a barbell. But it’s also an insight into a non-human intelligence — and perhaps even a rebuke to the limited way that we see barbells. Not to get all Kantian on you, but could it be that a barbell isn’t a barbell until a hand grasps it?
Perhaps the most puzzling question of all is not the differences between NNs and our own brains, but the similarities. Our instincts tell us that these computer creations can only go so far in matching the more intricate expressions of humanity. But then along comes another neural net experiment outside of Google that challenges even that perception: an artificial neural net that, on command, alters a photograph as if one of history’s greatest artists had created it.
It came from a trio of researchers based at University of Tubingen in Germany. Leon Gatys, a German doctoral candidate at the Bethge Lab at University of Tubingen, had been working with a team schooled in computation and neuroscience, trying to understand biological and computer visual systems. They had been using neural nets to identify and ultimately output textures as opposed to objects, when their experiments took a weird turn — was it possible to get a neural net to transform images creatively, in the same way an iconic artist would? Could a neural net understand a painting with the analytical skills of an art historian? Could it act as a master forger, rendering a vanilla scene from a photograph into something that appeared to be from the brush of a famous painter? To do this, they had to train a neural net to separate style from content, and then to identify the style so well that the net could reverse itself and render a scene in that style.
“It was absolutely not clear that you would have those independent factors of variation,” says Gatys. But after training a neural network on the differences between objects in famous artworks and photographs of those same objects unaffected by the artistic imagination — they were able to miraculously produce original images that looked as if they were recovered from the studios of these long-deceased masters.
In their paper, “A Neural Algorithm of Artistic Style,” they show the proof, extracting the style from well-known paintings like Munch’s The Scream, Kandinsky’s Composition VII, or Van Gogh’s Starry Night, and then running photographs that match the actual content of the paintings through their system. The results uncannily resembled the paintings.
The paper hit the internet in early September. When an open-source version of the software appeared not long afterwards, a graphics community already intoxicated from Deep Dreams experienced another woozy orgy of creation.
One eager participant was Karl Stiefvater, a computer graphics specialist (he wrote the code that blew up Neo’s spaceship in The Matrix trilogy; more recently he built key graphics tools for Linden Labs). His iOS app, Pikazo, offers styles drawn from sources beyond humans: for instance, one option is to render an image in the style of a circuitboard. It’s not an instant transformation, as “It takes four quadrillion floating point operations,” Stiefvater explains.
The success of the German experiment — and the ease with which Stiefvater’s app can render your family photos into Klimt masterpieces — evokes huge questions. We have eyeball proof that German researchers built an artificial brain that can “paint” in the style of utterly unique artists, like Kandinsky. No, artificial neural networks are not brains. But they learn like brains and see in a somewhat similar manner. So can we learn anything about Kandinsky’s brain by studying these networks?
“That’s a very difficult question,” says Gatys. “We manipulate image representation. But there’s no intelligent agent here. It’s very hard [to understand what] the individual Kandinsky had in mind, what made him paint those images.”
Stiefvater of Pikazo feels that breaking down even the most radical forms of genius is ultimately a math problem. “I like artistic creativity, [but] I am not of the belief that [that creation] is supernatural,” he says. “It’s a mechanism — a cogwork.”
From the moment that Mordvintsev and his colleagues posted on the research blog, one aspect struck people immediately, and raised questions about potentially deep associations between artificial and biological neural nets. This was the uncanny correlation between Deep Dream images and human hallucinations provoked by a stout dose of LSD or magic mushrooms. (Or Hunter S. Thompson’s mind under his normal regimen of chemicals.) Were the same factors at work in deep dreaming neural nets and acidheads? Some researchers thought so. “[Google’s images are] very much something that you’d imagine you’d get with psychedelics or during hallucinations, and that’s perfectly sensible,” Karl Friston, a professor of neuroscience at University College London, told reporter Sophie Weiner.
Google, as you might suspect, could have lived without the whole drug analogy. In general, while definitely supporting the project, it seemed to strain to keep the hype in context, as there were plenty of more functional AI breakthroughs in the company that didn’t have mind-blowing pictures with which to woo the Internet. (One prominent researcher outside the company describes the Deep Dream work as “a nice trick.”) It took me weeks to get the company to grant the first full-blown interview with the team.
Of course, when I did, I asked them about why they thought the deep-dream images seemed, in the buoyant phrasing of the Chambers Brothers, psychedelicized.
They didn’t duck the implications. “There is a deep connection between the way our visual network works and our own brains work,” says Tyka. “So it wasn’t super surprising to me that a similar thing could happen in biological brains. I mean, if you put visual stimulants in biological systems and you [mess with] the synaptic signaling and neurons, you’re going to get weird over-interpretations and distortions. So I think that there is an analogy there. And you could argue that research in artificial neural networks may actually help us understand biological networks better.”
Mordvintsev agrees. “For me it’s also a strong sign that we are going in the right direction for constructive computer vision system using convolutional neural networks. Because they seem to get similar flaws.”
This is a mind-blowing concept in itself. Can it be that both hallucinating people and deep-dreaming neural nets experience visions based on the same visual cues? Have both systems jimmied open identical doors of perception? How much can these networks teach us about ourselves?
Raising questions like that is why Deep Dreams — even though some might dismiss it as a “neat trick” — is so important. The flashy exposure of these images is an entry point into examining the issues we could face as neural nets and deep learning become entwined deeper in our lives.
No matter what comes of the Deep Dreams phenomenon, the experience has changed the life of the trio who posted the original item. Alex Mordvintsev is no longer in Safe Search; he is now a part of a Google machine learning research team based in Seattle. His internship over, Chris Olah is now a full-time Googler on the AI research team. Mike Tyka now spends a portion of his time exploring how artists can use machine intelligence to create art.
At the time I interviewed them, some months after the blog post, the trio had never met in person — only Olah was in the room with me, the others participating remotely by Hangout. So I snapped a picture framing Olah, who had been in a Google conference room with me, between two large monitors holding, respectively, Mordvintsev in Zurich and Tyka in Seattle.
Later, I wondered how a neural net might interpret that group shot. Depending on the parameters, it might accurately identify the trio by name, and correctly surmise that the photo was produced in a conference room on the Google campus. In fact, Google Photos, a product released just days before Mordvintsev’s breakthrough, handily uses deep-learning techniques to handle tasks like that, many thousands of times a day.
But maybe, if the proper artificial neurons were firing, the machines would stare at the picture until fang-bared heads of vicious dogs burst from the engineers’ sleeves, collars and ears. Who’s to say whose vision is more real?
Gifs by Backchannel. Opening photo by Steven Levy.
This is the first of a series of stories on the “assault on reality,” where technology is stretching our long-held concepts of what is real.
Correction: the original version attributed the animal names and the barbell revelation to Mike Tyka. These were Chris Olah’s contributions.