Vehicle Speed Estimation from Video using Deep Learning

Predict the speed of a vehicle with Optical Flow + CNN in PyTorch [Link to the code on GitHub]

shafu.eth
4 min readOct 31, 2020
[Image by me]

The problem I want to solve is the following: There is a camera in a vehicle and I want to know how fast it is going. You obviously can not look at the speedometer, only at the video footage itself. Something a little deep learning magic should help us with.

Data

I have 2 different videos. One for training and the other for testing. The training video has 20399 frames and the one for testing has 10797 frames. You can download the videos from here. Here are some examples:

example images from the videos [Image by me]

There are labels for the training video as a .txt file where each line corresponds to the speed at that particular frame.

Approach

The most interesting aspect of this problem is what your neural network input will look like. Calculating speed from only one static image is not possible. One valid approach would be stacking two or more images together or something sequential like an LSTM or Transformer. Another one would be calculating optical flow, which I decided to use.

What is optical flow? It is basically a way to calculate a vector for each pixel that tells you the relative motion between two images. There is a great computerphile video on it here for more details. There are “classic” computer vision algorithms for calculation optical flow but deep learning has become way better (no surprises here). So what are the SOTA approaches, let's check paperswithcode:

[Image by me]

This RAFT thing seems good and it has a PyTorch implementation (nice!). I forked the original repository and made it a little bit simpler. I don’t need the training, evaluation etc. We will only use it for inference.

Calculating Optical Flow

To do the inference the network concatenates two images and predicts a tensor with these dimensions: (2, image_height, image_width). As we said before, one 2-dimensional vector for each pixel in the image. These we will use for the actual training so we will save them as .npy files. If you visualize the optical flow images it will look like this:

optical flow looks cool [Image by me]

Training

Remember what we are trying to train for:

Optical Flow → Model → Vehicle Speed Estimation

The model I chose is EfficientNet. I really like it because of its scalability. It has 8 different versions that you can choose from and the largest one, EfficientNet-B7 is still very very good. You can start with a small variant like B0 and then if everything works correctly and you have a good enough GPU you can choose a bigger one. There is also a great PyTorch library that I will use to very easily load a pretrained EfficientNet model that you can find here. If you open the train.ipynb you can look at how the training will work.

I always start with a B0 and then scale up to a B3 because my GPU only has 6 GB RAM [Insert sad face here].

After training I get the following results (loss is mean squared error):

training loss [Image by me]
validation loss [Image by me]

Nice, it seems that everything worked correctly! Training and validation loss go down and the network does not overfit. I could have probably quit training after 150 steps but my early stopping logic was very bad.

These are the results (You can watch the whole video on YouTube here):

Not perfect but it definitely does something useful [Image by me]

Conclusion

I’m normally not a big fan of feature engineering but I think in this case it worked out pretty well. The next step would be to try out something inherently sequential like a Transformer or LSTM.

--

--