Nerd For Tech
Published in

Nerd For Tech

Review — Video Frame Interpolation Using ConvLSTM (Camera Tampering Detection)

Video Frame Interpolation Using ConvLSTM for Camera Tampering Detection

Surveillance Cameras on Moving Train

In this story, Physical Integrity Attack Detection of Surveillance Camera with Deep Learning based Video Frame Interpolation, by Nanyang Technological University, is reviewed. In this paper:

  • Video Frame Interpolation Using ConvLSTM is used to predict a video frame.
  • The predicted frame is compared with the current frame to decide if there is a (cyber or physical) tamper occurred.

This is a paper in 2019 IoTaIS. (Sik-Ho Tsang @ Medium)


  1. Cyber Attacks and Physical Tampering Against Surveillance Cameras
  2. Proposed Video Frame Interpolation Using ConvLSTM
  3. Experimental Results

1. Cyber Attacks and Physical Tampering Against Surveillance Cameras

  • Surveillance cameras, like all Internet of Things (IoT) devices, are also at risk to a wide range of cyber threats.
  • Unauthorized privileged remote access to these surveillance cameras could alter the configuration of the vulnerable cameras preventing them from performing their intended surveillance functions or coverage.
  • Physical attacks also prevent surveillance cameras from working normally.
  • These tampering attacks negatively affect the camera views, and consequently negatively affect surveillance system functions.
  • Blocking camera’s view that simulated by masking out the video footage with a black image.
Rightmost video frame simulates blocked view
  • Adjusting the zoom configuration of the camera that was simulated by cropping specific video frames.
Rightmost video frame simulates zoom-in
  • Altering the focus of the camera that was simulated by applying image filters (blurring) to specific video frames.
Rightmost video frame simulates blurring effect
  • Adjusting the camera’s viewing angle that was simulated by shifting the viewing area of the recorded video footage.
Rightmost video frame simulates small left-shift

2. Proposed Video Frame Interpolation Using ConvLSTM

  • To detect the above tampering attacks, Video Frame Interpolation using ConvLSTM is proposed.

2.1. Video Frame Interpolation for Camera Tampering Detection

Video Frame Interpolation
  • Suppose we got the current frame θt at time instant t.
  • And we also got the previous frames θt-1, θt-2, …, θt-5.
  • The current frame θt and the previous frames θt-2, …, θt-5 are inputted into the Video Frame Interpolation network to get the interpolated frame ^θt-1.
  • If the difference of the predicted ^θt-1 and the actual θt-1 has large difference, camera tampering is detected.
  • Otherwise, it is in normal state, i.e. there is no camera tampering.

2.2. Video Frame Interpolation Using ConvLSTM

  • The encoding ConvLSTM network ingests the input sequence of video frames from t-J, …, t-2, and also t into its hidden layers of ConvLSTM with one video frame excluded the frame at t-1.
  • The Interpolation ConvLSTM network will unfold the hidden state to reconstruct the missing video frame through interpolation.
  • To be brief, ConvLSTM is LSTM but using convolution layers, not fully connected layers.
  • (For details of ConvLSTM, please feel free to read MsFEM+MsBEN, or to read the ConvLSTM paper.)

3. Experimental Results

  • All models were trained and tested with the four sets of test scenarios.
  • An evaluation criterion based on True Positive, True Negative, False Positive, False Negatives were collected.
  • Additionally, the F1, Precision and Recall values were computed.
  • Hence a total of 32 test evaluation results were collected with the four test scenarios with two sets of previously unseen camera videos across four models including our interpolation anomaly detector model.
  • (But the dataset is not publicly available.)
  • Predictive modeling (Predictor), VAE (Variational Autoencoder), and AE (Autoencoder) is compared.
Blocking Camera’s View
Adjust Camera’s Zoom
Alter Camera’s Focus
Shift Camera — Camera 1
  • The numbers of TP/TN/FP/FN, I believe, are the number of frames. (It is not clear in the paper.)
  • The test results indicate that the anomaly detector with video frame interpolation performed the best compared to the other models for physical attacks for three out of the four test scenarios.
  • It was second best for the simulated tests of adjusting the camera’s focus.
Reconstructed video frame by our ConvLSTM Interpolator when the train carriage is empty (left) and has passengers (right)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store