Deep Nothing: Understanding Humans

Tom Jacobs
5 min readAug 8, 2017

AI is amazing.

Neural networks are amazing.

TensorFlow is amazing.

These programs are going to take over the world!

Then why don’t they know who I am? Even my dog knows who I am. Is all of AI not as good as a dog? Let’s fix that.

I wrote a program that watches a TV show, and is able to identify when each character is present on screen. The eventual goal is for the program to have some understanding of what is happening with the characters and the overall story — but this is a start.

The program looks at frames of the video, every 10 seconds, over the 22 minute episode, and detects and identifies faces of people it knows, and outputs a graph of what it has found over the 22 minutes. Here’s its understanding so far:

As you can guess if you’re a fan, the show is Seinfeld. Each colour is a different character, and I’ve put them up at different y-axis levels so you can see them all when they’re on screen at the same time. The x-axis runs from 0 to 22 minutes of the episode’s running time.

As you can see on the graph, with the big blue bar on the left, for the first two minutes, we have Jerry’s opening standup act over the first minute, and then just Jerry by himself gets a massage for another minute.

Then we cut the rest of the crew, discussing it, at 2:30 minutes in. So the program outputs three colours on the graph over this time period. And we get a little discussion between Jerry and Elaine (shown in cyan in the graph), and George (shown in red) asks about the free massage at the end of the scene.

Looking at the graph, we can see at 4 minutes, the blue Jerry bar disappears for a while. That’s the next scene: just George and Elaine going to get massages, focusing on George’s massage, from 5 minutes to 9 minutes in, shown in red in the graph.

At 9 minutes you can see the blue bar re-appears: we’re back at Jerry’s!, with George talking about the massage with Jerry. Kramer pops in after a minute, and so the black bar appears at 10 minutes on the graph.

From 13 minutes to 17 minutes there are a few different scenes with Kramer talking about doughnuts. That black bar stays mostly solid for the first time as Kramer gets a lot of screen time.

Then the next scene is just Jerry and George from 17 minutes onwards, at the dentist. It’s just blue Jerry and red George on the graph.

And we finish up at the coffee shop with the three of them, with Kramer popping in at 20 minutes.

And finally we close the show at 22 minutes, with a minute of Jerry’s standup. All blue (not the standup, the graph).

Pretty good for a computer program. I know I grew up learning a lot from TV. I see no reason why an AI program couldn’t do the same!

The only mistake it made that was obvious was it briefly detected Kramer in the opening apartment scene, but he doesn’t show up until later. Maybe someone made a Kramer-face.

Next project: NLP on those subtitles. Let’s see if we can start understanding some concepts discussed, and connecting words like “massage” and “doughnut” with concepts like “getting” and “eating”.

The code is available here:

Unlisted

--

--