Violence Detection in Video Data

2 min readNov 27, 2019

Introduction

Video data has become very pervasive with the growth of many platforms where people upload tons of data as well as data gathered by surveillance cameras deployed worldwide. In 2016, surveillance cameras deployed worldwide were generating 566 petabytes of video data everyday. You tube users upload 300 hrs of videos every minute.Challenge is to monitor this massive amount of video data to identify abnormal activities.

Model

Video data is complex since it has both spatial and temporal features and both needs to be taken into account for identifying any activity. A hybrid model consisting of CNN(spatial features) and an LSTM(temporal features) was used as shown below.

Training

A pre-trained inception V3 trained on imageNet dataset was fine tuned for spatial features extraction for video clips followed by training an LSTM model. Following publicly available dataset https://www.crcv.ucf.edu/data/UCF101.php along with data collected online was used for training the model.