I See Dead People, or It’s Intelligence, Jim, But Not As We Know It

Moshe Sipper, Ph.D.
Published in
4 min readJun 14


Take a look at this picture, the well-known painting “American Gothic” by Grant Wood:

What do you see?

More precisely, let’s caption this painting. How about, “A man and a woman posing for a picture”?

Sounds reasonable to me.

In fact, I didn’t come up with this caption myself. A deep network, known as ViT-GPT2, came up with it (the “ViT” part stands for visual transformer, and the “GPT” part stands for generative pre-trained transformer).

Well played, ViT-GPT2.

Now take a look at this image:

“Are you kidding me,” you’re probably mumbling, “that’s the same painting.”

Not quite. There are some subtle differences from the first one, quite undetectable to the human eye.

Yet we humans would still caption this as, “A man and a woman posing for a picture.”

Lo and behold, when given to ViT-GPT2, the AI describes the picture as: “A table with a bunch of stuffed animals on top of it.”

Since we’ve a Star Trek reference in the title, let’s quote Mr. Spock now: “Fascinating.”

Here’s another painting, “Whistler’s Mother”, by James Abbott McNeill Whistler:

What’s this, ViT-GPT2? Well, says the AI, it’s: “A woman sitting on a chair in a room.”

Not bad.

How about this one:

Oh, says ViT-GPT2 without batting an eyelid (not surprising, given it has no eyelids), this is most obviously, “Two men standing next to each other near a truck.”

Duh. Obviously.

So what’s happening here? The above examples were taken from my recent paper with Raz Lapid: “I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models.”

Elsewhere, I wrote about how deep networks can be quite vulnerable to so-called adversarial attacks: Small perturbations to the input, imperceptible to the human eye, can fool a deep network.

Here’s another example, from our paper, “Patch of Invisibility: Naturalistic Black-Box Adversarial Attacks on Object Detectors.”

An adversarial patch evolved by our novel gradient-free algorithm, which conceals people from an object detector

This time we fooled Tiny-YOLOv4, an object-detection model, into believing that Raz, standing to the left, is… invisible. And all it took was a well-crafted picture of a cute dog. 🐶

These examples from our work in the adversarial-attack domain raise concerns about placing AI in safety-critical applications. But what I find really fascinating is that they further demonstrate just how different these AIs’ “intelligence” is from our own!

We are not fooled for one second by the cute dog. And we definitely do not see tables, stuffed animals, or trucks in the paintings above.

The “neural” in deep neural networks suggests that they have some ties with our own neural brain. In point of fact, the “neural” comes from the humble beginnings of the AI field, in the 1940s and 1950s, when researchers thought they were building networks out of artificial neurons, which resembled the wet neurons in our brain. Since then we’ve learned quite a bit about neurons and brains. And we’ve also strayed from the neural analogy. The upshot: there’s precious little “neural” about deep neural networks.

And there’s nothing to say that they should think like us (let’s put aside whether they’re thinking at all or just “thinking”). A deep network can excel one moment and fall flat on its face the next (that is, if it had a face).

If we agree that there’s real intelligence here (some say yes, others say no), I think one thing is beyond a shadow of a doubt:

This intelligence is alien.

AI-generated image (craiyon)

Let me end with a few more samples of captions we provoked:

“Orig” is the original image, “Adv” is the image we created to fool the network.



Moshe Sipper, Ph.D.

Award-winning professor of AI ⤳ 7 books ⤳ 210+ scientific publications ⤳ 50+ Medium articles