Cambridge University researchers invent a drone that detects bad acting

The pictures below are truly shocking scenes. A terrifying reflection of where we are as a society.

You might not, on first glance, find these pictures particularly frightening. After all, they are just a bunch of guys playacting.

But the frightening aspect of these pictures is not what the people are doing, it is what they believe. These are grown men who think they are contributing to science. They are strangling, punching, kicking and pointing in a threatening manner in the belief that they will train a new generation of Artificial Intelligence. And it is their trust in this enterprise that should frighten you.

These pictures are taken from a paper that will soon appear in the journal IEEE Computer Vision and Pattern Recognition. In the article’s abstract the authors claim to have designed a system that“detects violent individuals in real-time by processing the drone images”. The images above are taken from a training set — -a set of images from which the neural network algorithm ‘learns’ — -, which is then used to classify violent acts.

There is so much wrong with this study that I don’t know where to start. First of all there is the obvious question that many people will be asking: why exactly is a PhD student at Cambridge University and researchers at two leading Indian Universities using taxpayers money to design drones to monitor our behaviour? Does society really want this type of automated surveillance?

But I’m going to leave that question aside and make, what I think, is the more important point. The researchers’ efforts are doomed to failure.

To understand why, we just have to look at the photographs above. Do you think those people are actually fighting? Do you think the guy in the second photo is about to take a punch to the nose? When he kicks back in the third picture, do you think he is going to take out his legs opponent? Of course you don’t.

So let us imagine that these clever engineers do manage to train their algorithm; what will the result be? It will be an algorithm that finds people who are acting badly, and I don’t mean behaving badly, I mean putting in poor theatrical performances. The researchers will have developed a drone that they can send out to every school play and amateur night performance, which will then automatically pick out the performers who aren’t properly engaging in their roles.

Except that won’t happen either. The data set the researchers used contains just 2000 images of just 25 different people engaged in play fighting. The data sets used by Google and others for object recognition consist of hundreds of millions of objects in a wide range of settings. A few pictures, all taken in a very similar setting, is far too few to reliably classify violence.

It is important in this context to consider the success rate, of around 80 to 90%, reported by the researchers. This might sound impressive, but again we need to look closely at the images to understand what this means.

The pictures above show some of the ‘violent’ (red) and ‘non-violent’ (blue) individuals picked out by the algorithm. What the algorithm has done is classify individuals with their hands or legs in the air as violent. The blue individuals have, apparently following instructions, been careful to stand still. The actors knew the purpose of the study and adjusted their pose to suit the researchers’ aims. Any change of context, such as an image of some people doing aerobics or dancing, would cause the algorithm to fail.

A crowd control drone equipped with this violence detection algorithm and crowd-control weapons would end up pepper spraying entire dance festivals of suspected offenders. Or worse.

And I haven’t even mentioned the racial dimension. The fact that the test data is based on 18–25 year-old men of Indian descent, 50% of whom are performing violent acts, will heavily bias the algorithm to see this demographic as more likely to be violent than other age or ethnic groups.

All in all this study is, as presented in the paper, over-hyped. But I do want to finish with a word of defence for the researchers; to perhaps try and explain why this type of publication happens.

The fact is that, on a technical level, the researchers’ approach is fine. They are testing whether or not a neural network algorithm can detect body posture in a crowded environment, and the results are a demonstration that it can. I imagine it is on this basis that the paper was accepted in a prestigious journal.

We do this funny thing in academia, which is not always easy to understand from the outside. We play a game with ourselves and each other, where we imagine the eventual application of our work, and use that to motivate the reader and increase the chance that reviewers see the long-term importance of our work.

Grant-making bodies demand we play this game, trying to convince them that our research will have medium-term benefits to society, and we go along with it. Deep down we know that we are playing a game, and when other peer in and ask what we are doing, the game can look rather silly. That is exactly what has happened here. It is not just the subjects who are playacting, but the researchers themselves.

And, when the media lights come on, the illusion is revealed.