Violent Action Recognition using Drone Data

Published in

Deviant Action Recognition Using Drone Data

6 min readOct 30, 2019

Written by Anosh Billimoria and Divyaansh Devarriya

Drone Potential

Drones have been getting a huge backlash for a long time now yet we tend to be amazed by what they can do. For instance, in aerial photography, 3-D mapping, transportation, survey inaccessible lands and simply looking cool.

This project, however, is none of that since we are attempting to tackle a more controversial or let’s say not yet a well-implemented system of making surveillance using drones more commercial and effective. The drone opens vast possibilities of usage and solving various problems in a new way but it still requires a better arsenal of tools to be effective. While we have seen some simple implementations of drones in recording video not every second of footage is informative. But the ones worth evaluating can be easily overlooked by the feature of ‘human error’. It has alarmed many and also discussed at large for some time now that an AI or computer vision aide to the existing system may be the answer. With this project, we try our best to explore a possible solution and a system that contributes to the process of video surveillance.

Requirements

In a nutshell, this endeavor requires techniques of image-video processing and also machine learning, which makes it exciting to work on. While there has been much research performed yet nothing has been concertized which leaves us with a good amount of work which has to be done.

The drone captures video at an uncommon angle and is not in line with the usual human line-of-sight. It is also challenging that a video can be recorded at different heights and ultimately this covers a large area and huge crowds. This project requires time and effort starting from the very first step of data collection which makes it suitable for a capstone project in our penultimate semester at university.

Data Collection

Undoubtedly, we simply have to record a bunch of videos but we ended up paying the price of this assumption after an unfavorable length of time.

Let’s break it down, before any new project you do your research apparently this was maybe only true before our world became so ubiquitously connected and the access to information was as easy as a click of a button. This phenomenon has it’s perks and certainly some downsides, in our context getting all the existing information is possible but with the ever-changing technology and the sheer expanse of it, can overwhelm quite easily. Up until the final days of deadlines, we still found new articles and papers that while didn’t drastically affect the project but would have certainly been helpful if found much earlier in the process.

So just like that initially we collected a bunch of people enacting predefined actions namely (punch, kicks and fall). Our first variation included heights of 2, 5 and 10 meters. The next one involved experimenting with lighting. The camera on a drone is a simple RGB camera and well only imparts a small amount of information. Our research brought us to the state-of-the-art method of pose estimation implemented in an easy to get up and running python + keras (with tensorflow) environment. This acted as our starter code and it has near-perfect execution on large crowds as well.

Initial Approach

To test the ability of the concepts of transfer learning, we used VGG16 network with pre-trained imagenet weights. We added a convolution layer with relu as the activation and softmax in the last layer for 2 classes, namely, punch and kick. The model was able to classify images with good accuracy. But, an action cannot be represented in a single frame and requires an entire sequence of action in multiple frames. It is also worth mentioning that the human key-point data can be more useful in action classification as compared to just an image.

Procedure

The next steps involved clipping and segregation of clips into required actions. All videos were passed into our key-points extraction code, to get the exact key points of each human in the frame. We grouped multiple frames to capture a complete action and labeled them accordingly. The classes were punch, kick, fall and non-violent activity. We initially attempted a novel neural network, but due to a lack of sufficient data for each action, the results weren’t satisfactory. In the meantime, we applied ensemble machine learning models such as Random Forests. We have achieved sufficient accuracy on the limited data we used as shown in Fig 1 and Fig 2 .

Fig 1: Accuracy on training and testing data

Potential Use cases and some survey results

We went around asking people about how our project can be in their area of work and here some highlights of some answers.

Event Manager: Yes in managing an event, it will be helpful. So if we will get such kind of drone then we can
conduct a peaceful event, without any riots and will use less manpower. Moreover, it can also be used in big
exhibitions like a trade fair, to control the crowd and suspect every corner of the venue.

Private security: There are times when the number of people on our force is unable to cover each and every corner of the premises. There may be times when one of us patrolling the area may miss some details and the assailant has hidden themselves such that they are stealthily waiting to strike the officer from behind a drone which hovers right behind him or covering they tracks and covering more field of view than a human can be useful for detecting an apparent activity and alert the forces to take necessary action.

Research fellow: With the advent of drones and drone surveillance, the opportunity of research has increased. Drone surveillance is helpful in situations where CCTV cannot be used. As CCTVs are stationary therefore there’s a limited amount of area that can be covered but drones are mobile therefore can be used at places where CCTVs fail. After the inception if deep learning, the research in the field of Computer Vision has also increased. ML or Computer Vision techniques applied to drone data can now be used in recognition of violent activity in real-time.

Law Enforcement Department: It can be helpful in law enforcement for the purposes of recognizing active areas of violence. The operator who’s controlling the drone can keep the vigilance officer in a loop while conducting the drone activity. But at the same time, It can also create a lot of privacy issues. People may object to the unwanted surveillance they would be made subject to. The drone surveillance is quite helpful when it comes to detecting terrorist-related activities, but not that helpful if it keeps reporting petty fights happening in small areas, which people prefer to keep private.

Potential and Future plans

n the future work, we plan to classify action in real-time and deploy the model in the cloud so that the drone itself can classify action in real-time. We also plan to create an alarm system, which will notify a supervisor once a violent action is detected by the drone.