This is an excerpt from How Smart Machines Think by Sean Gerrish. How Smart Machines Think offers an engaging and accessible overview of the breakthroughs in artificial intelligence and machine learning that have made today’s machines so smart.
On June 10, 2015, a strange and mysterious image showed up on the internet, posted anonymously on the website Imgur.com. At first glance, the picture looked like one or two squirrels relaxing on a ledge. But the resemblance ended there: as you looked more closely, you could make out bizarre detail — and objects — at every scale. The image on the internet was psychedelic, like a fractal, with a dog’s snout on the squirrel’s face, a mystical pagoda here, a human torso there, and a bird-giraffe creature over there, seamlessly embedded into the fine detail of the image. Uncanny eyes peered out from every nook and cranny. Looking at this image felt like looking for objects in clouds, except that it wasn’t your imagination. Or was it? You had to look again to see.
It was clear that the image hadn’t been created by a human. It was too bizarre to be a photograph, and its detail was too fine to be an illustration. The anonymous user who had posted the picture on Imgur.com described it only with this note:
This image was generated by a computer on its own (from a friend working on AI).
As the image began to spread and the denizens of the internet tried to make sense of it, engineers over at Google were generating more images just like this and sharing them with one another. A week later, they published a blog post explaining the phenomenon. The image had indeed been generated by AI — specifically an artificial neural network. The phenomenon became known as Deep Dream. With the arrival of these images, people began asking some uncomfortable questions that had been lurking beneath the surface. Are these really android dreams? Do we even understand what’s going on in these networks? Have researchers gone too far in their efforts to recreate human thinking?
These concerns about intelligent machines had been further stirred up because the likes of the modern industrialist Elon Musk were voicing their own worries. Musk, who had reportedly invested in DeepMind to keep an eye on the progress of AI, worried that his good friend Larry Page — one of Google’s founders — might “produce something evil by accident,” including, rather specifically, “a fleet of artificial intelligence–enhanced robots capable of destroying mankind.”
When these images came out, we already knew that neural networks could be useful in playing Atari games and in understanding the content of images. The images did stir up some uncomfortable questions, but the reasons neural networks can be good at playing Atari games and the reasons they’re able to produce psychedelic dreamscapes are actually closely related. And even though these dreamscapes seemed at first to make deep neural networks more mysterious, it turns out that they can also make them less mysterious.
Suppose that we take a photo of your pet dog and pass that photo through a deep neural network like the ones Google uses. As long as you know how the network was tuned, the artificial neurons in the network will “light up” predictably, layer by layer. In each layer, some neurons will remain dark while others will glow brightly as they respond to different patterns in the image. Since we passed a photo of your pet dog into the network, if we look deep enough in the network — say, at the fourth or fifth level — the neurons will represent object parts that we’ll likely recognize. Those neurons that respond to things like fur and parts of a dog’s face will be glowing brightly. If the neural network is trained to recognize different objects, including dogs, then when we look at the final layer, the dog neuron will be lit up, while most of the remaining neurons will be dark.
Now here’s where it gets interesting. The algorithm to train the network to recognize dogs adjusted the network weights based on how “incorrect” the dog neuron at the end of the network was for a bunch of pictures. It used a mathematical function that measured how close the output of the network was to the training example’s label. That label was just a 1 or a 0 describing whether the image did or didn’t have a dog. The algorithm to train the network then calculated, using high school calculus, in which direction it should adjust the network’s weights so the network could predict the output values just a bit more accurately the next time around.
What if, instead of adjusting the network’s weights to agree more with the image, we instead adjusted the image to agree more with the network? In other words, once we’ve already trained the network, what would happen if we keep the network’s weights fixed to what they are, and adjust the input image — say, a photograph of a cloud — so that the dog neuron is more bright while the other neurons remain dark?
If we adjust the image like this, adjusting the pixels a bit at a time and then repeating, then we would actually start to see dogs in the photo, even if there weren’t dogs there to begin with! In fact, this is how some of the images in the last chapter were generated: a group of deep learning researchers took a network just like AlexNet and adjusted input images so that certain neurons — representing a great white shark or an hourglass, for example — became bright, while other neurons remained dark. Google’s researchers used a similar method to analyze their own neural networks. When they wrote about how they did this, they gave several examples. In one of these examples, they looked at images generated from a neuron that recognized dumbbells, the equipment that you would find in a gym. They found that the images indeed showed dumbbells; but they also showed muscular arms attached to these dumbbells. Apparently, they observed, the network learned that an important distinguishing characteristic of dumbbells isn’t just the hardware itself; but also the context in which it they’re used.Google created its Deep Dream images in a similar way, except that instead of forcing the network to generate pictures of dogs or other specific objects, they let the network create more of whatever it saw in the image. As the Deep Dream engineers wrote on Google’s research blog:
Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.
If we choose higher-level layers, which identify more sophisticated features in images, complex features or even whole objects tend to emerge. Again, we just start with an existing image and give it to our neural net. We ask the network: “Whatever you see there, I want more of it!” This creates a feedback loop: if a cloud looks a little bit like a bird, the network will make it look more like a bird. This in turn will make the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.
And that’s how the mysterious image from Imgur.com was created.
To learn more about How Smart Machines Think, click here.