How I built the Person of Interest Machine

or at least the half of it

I had always been used to believe that I am gifted with my skill of coding. But I haven’t always used them at the right place or at least for the right purpose. Just then, I came across the CBS series “Person Of Interest” that revolves around a computer system dubbed the “Machine” that can analyze data from surveillance cameras, electronic communications, and audio input and predict acts of crime.

I had this spark then, Why can’t I make one such machine? While most of the things the machine do are fairly impossible at present, I was able to design a working system of the machine that can perform at least half of what the POI Machine does.

Warning! The forthcoming paragraphs will cover mostly of the technologies I had utilized to build it. If you’re not interested to get into technical stuff, you can skip right away to “Day One” section. Grab a popcorn.

The “Machine” as described in Person of Interest has artificial consciousness which is pretty much out of equation at the current moment. I started listing down all the features the machine came packed with, thanks to the episode where Nathan boots the system reveals most of the stuff.

“I really wanted to appreciate Jonathan Nolan and the writers on the technical specificity in Person of Interest”

The first step was to decide upon the technologies I need to use, and I ended up with the following,

  1. NodeJS for forwarding and handling the videos streamed from the users’ webcam through Websockets
  2. Python for most of the NLP tasks, OCR, Information Extraction, Anomaly detection,etc,.
  3. C++ for processing the images from videos using OpenCV
  4. Databases including MongoDB, Redis, MySQL, Cayley.

Various other frameworks and libraries used includes NuPIC, OpenCog, ConceptNet, Natural, WordNet, Freebase and DBPedia. And of course, Apache Hadoop and Thrift.

Once I had decided on all the technologies to work on, I started building it. The initial phase was to ‘teach’ as in POI terms to recognize faces of people, their speech and analyze their emotions.

I tested it in a closed environment, in my room, myself at different corners checking if it can see me and followed the test with multiple users. Once it was able to do so, the next problem as Harold says would be “to sort them all out”.

The system needs to have knowledge of the world. I made use of ConceptNet and WordNet to serve this purpose. For the sentiment analysis from users’ speech, I used AFINN list and it turned out to be very good one.

The Query processing was a tedious job. I referenced the architecture of IBM’s Watson (It’s one hell of a Q&A System).

Graph Database plays a vital role in mapping the connections between the users in real world. I can recall an episode where the machine points out that “the taxi driver and the passenger were actually fifth cousins”.

I was initially skeptical on opting for the right one, but ended up finally with Cayley and it did the job pretty well.

Handling the data was a very difficult task. Most of the information you receive will be unstructured and thanks to UIMA, I was able to sort out most of them.

The system used a small Memcached instance to cache the primary information of all the people included an uniqueID, names, Aadhar ID (like SSN), their recent location, emotion and seen with, when. The last four tend to change in real-time (or, near maybe).

I didn't want the ‘machine’ to be neither a closed or open system, but a combination of both. The machine works autonomously, and it will send you an email when you are predicted to be a victim.

When I mentioned the system to be a ‘combination of both’, it comes with an UI where you can query a name or Aadhar ID and the machine will say only where the person was, when and with whom. Nothing more than it as it may pose a threat to their privacy.

It cannot actually identify a gun shot, but did a good job on predicting the crime from speech, and their emotions. I did also patched the system to identify the person with a colored box as seen in POI, “Yellow” for who knows about the machine, “White” for common people and “Red” for perpetrator.

The cluster configuration on which it ran is as follows,

  1. 5 nodes each with 2.1GHz Intel Quad Core processor and 4GB RAM
  2. 2TB of total memory
  3. Each equipped with a NVIDIA GeForce GT 610 GPU

The system achieved a maximum speed of 0.29 Tflops, which is really not great, but fair enough initially.


I was pretty excited. It has been technically Day 0 in those earlier days of building. Booted the system and started with small face and speech recognition tests. It did good on tracking a couple of people but failed when scaled up. It was a triumph to me and I was jubilant, I forgot to take images on Day 1.

My friend was squalid enough to block it from watching us.

To put it in action, I connected it to my department’s local network and subscribed to the video stream from the computers. I can still recall my friend saying to the other during demonstration “We must kill Raghav before this weekend at the central park”.

I got an email within seconds notifying me that I am predicted to be a victim.

The system is 360 days old, as of today, and it still excites me every time I see her recognizing me. I personally named her ‘Amber’.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.