How to recognise objects in videos with PyTorch

William Clemens
dida Machine Learning
2 min readMar 27, 2020

Self-driving cars still have difficulties in detecting objects in front of them with sufficient reliability. In general, though, the performance of state-of-the-art object detection models is already very impressive — and they are not too difficult to apply.

Here I will walk you through streaming a YouTube video into Python and then applying a pre-trained PyTorch model to it in order to detect objects.

We’ll be applying a model pre-trained on the object detection dataset COCO. (In reality, the model would of course be fine tuned to the task at hand.)

YouTube to OpenCV

First the imports. Most of these are pretty standard. Pafy is a video streaming library, and we will need the colourmaps from matplotlib for the bounding boxes later on.

COCO_CLASSES is just a dictionary containing the COCO class names.

We’re going to use NVIDIA’s implementation of the SSD using torch hub. If you’re interested in the details of the network you can read the paper here.

Let’s write a helper function to get an OpenCV VideoCapture object containing our YouTube video:

Now we can just use the output of this function as a normal OpenCV VideoCapture object just like from a webcam!

We’ll open up the first frame of a video just to take a look.

You should get this output:

Detecting objects

Ok, we can comfortably load a YouTube video, now we’ll do some object detection.

To keep our code looking nice, we’ll wrap up all the gory details of the implementation in a callable class.

Trying it out

Now we basically have all the code written!

Let’s try it out on the first video frame.

We can then just run over the video and write to a video file as normal in OpenCV.

This takes quite along time even on a GPU.

Conclusion

That’s it. Above I presented and explained everything you need to run your own object recognition model on any YouTube video you like.

Originally published at https://dida.do on March 27, 2020.

--

--