Review — Video Frame Interpolation Using ConvLSTM (Camera Tampering Detection)
Video Frame Interpolation Using ConvLSTM for Camera Tampering Detection
In this story, Physical Integrity Attack Detection of Surveillance Camera with Deep Learning based Video Frame Interpolation, by Nanyang Technological University, is reviewed. In this paper:
- Video Frame Interpolation Using ConvLSTM is used to predict a video frame.
- The predicted frame is compared with the current frame to decide if there is a (cyber or physical) tamper occurred.
This is a paper in 2019 IoTaIS. (Sik-Ho Tsang @ Medium)
- Cyber Attacks and Physical Tampering Against Surveillance Cameras
- Proposed Video Frame Interpolation Using ConvLSTM
- Experimental Results
1. Cyber Attacks and Physical Tampering Against Surveillance Cameras
- Surveillance cameras, like all Internet of Things (IoT) devices, are also at risk to a wide range of cyber threats.
- Unauthorized privileged remote access to these surveillance cameras could alter the configuration of the vulnerable cameras preventing them from performing their intended surveillance functions or coverage.
- Physical attacks also prevent surveillance cameras from working normally.
- These tampering attacks negatively affect the camera views, and consequently negatively affect surveillance system functions.
- Blocking camera’s view that simulated by masking out the video footage with a black image.
- Adjusting the zoom configuration of the camera that was simulated by cropping specific video frames.
- Altering the focus of the camera that was simulated by applying image filters (blurring) to specific video frames.
- Adjusting the camera’s viewing angle that was simulated by shifting the viewing area of the recorded video footage.
2. Proposed Video Frame Interpolation Using ConvLSTM
- To detect the above tampering attacks, Video Frame Interpolation using ConvLSTM is proposed.
2.1. Video Frame Interpolation for Camera Tampering Detection
- Suppose we got the current frame θt at time instant t.
- And we also got the previous frames θt-1, θt-2, …, θt-5.
- The current frame θt and the previous frames θt-2, …, θt-5 are inputted into the Video Frame Interpolation network to get the interpolated frame ^θt-1.
- If the difference of the predicted ^θt-1 and the actual θt-1 has large difference, camera tampering is detected.
- Otherwise, it is in normal state, i.e. there is no camera tampering.
2.2. Video Frame Interpolation Using ConvLSTM
- The encoding ConvLSTM network ingests the input sequence of video frames from t-J, …, t-2, and also t into its hidden layers of ConvLSTM with one video frame excluded the frame at t-1.
- The Interpolation ConvLSTM network will unfold the hidden state to reconstruct the missing video frame through interpolation.
- To be brief, ConvLSTM is LSTM but using convolution layers, not fully connected layers.
- (For details of ConvLSTM, please feel free to read MsFEM+MsBEN, or to read the ConvLSTM paper.)
3. Experimental Results
- All models were trained and tested with the four sets of test scenarios.
- An evaluation criterion based on True Positive, True Negative, False Positive, False Negatives were collected.
- Additionally, the F1, Precision and Recall values were computed.
- Hence a total of 32 test evaluation results were collected with the four test scenarios with two sets of previously unseen camera videos across four models including our interpolation anomaly detector model.
- (But the dataset is not publicly available.)
- Predictive modeling (Predictor), VAE (Variational Autoencoder), and AE (Autoencoder) is compared.
- The numbers of TP/TN/FP/FN, I believe, are the number of frames. (It is not clear in the paper.)
- The test results indicate that the anomaly detector with video frame interpolation performed the best compared to the other models for physical attacks for three out of the four test scenarios.
- It was second best for the simulated tests of adjusting the camera’s focus.
[2019 IoTaIS] [VFI-ConvLSTM]
Physical Integrity Attack Detection of Surveillance Camera with Deep Learning based Video Frame Interpolation