Violence Detection in Video Data

Romit Singhai
2 min readNov 27, 2019

--

Introduction

Video data has become very pervasive with the growth of many platforms where people upload tons of data as well as data gathered by surveillance cameras deployed worldwide. In 2016, surveillance cameras deployed worldwide were generating 566 petabytes of video data everyday. You tube users upload 300 hrs of videos every minute.Challenge is to monitor this massive amount of video data to identify abnormal activities.

Model

Video data is complex since it has both spatial and temporal features and both needs to be taken into account for identifying any activity. A hybrid model consisting of CNN(spatial features) and an LSTM(temporal features) was used as shown below.

Source

Training

A pre-trained inception V3 trained on imageNet dataset was fine tuned for spatial features extraction for video clips followed by training an LSTM model. Following publicly available dataset https://www.crcv.ucf.edu/data/UCF101.php along with data collected online was used for training the model.

Below are the learning curves generated during training.

Demo

Link to the web interface for interacting with the model deployed using Tensorflow serving.

http://www.surveillance-analysis.com

Link to the video for real time identification of violence in surveillance use-case.

https://youtu.be/ZeFl7PC6ZTE

--

--