Object detection on public webcam with OpenCV and YOLOv4

Published in

Analytics Vidhya

7 min readApr 15, 2021

I wrote several articles already on object detection with Tensorflow and OpenCV. Now I wanted to try detecting people with Tensorflow and OpenCV on public webcams.
Well, the results were disappointing. I tried three different models from the Tensorflow model zoo but none of them worked due to the small size of the people on the webcams. These models barely detected anything which were distant from the camera.
Previously I have seen a lot of videos where people, even far from the camera, were detected successfully with YOLO, v3 or v4, so I decided now to give a try.

Build OpenCV from source with CUDA and cuDNN support for GPU acceleration

I have an NVidia RTX 2080 GPU so I wanted to make sure that the object detection is sped up by GPU support on OpenCV.
By default, when you install OpenCV for Python it comes without GPU and CUDA support. You can check that easily with the following commands in Python interpreter:

If you want to use your GPU with CUDA in OpenCV you have to build OpenCV from scratch on your local machine. To be honest it was the most difficult part of the whole YOLO experience.
I have a Windows 10 machine so I followed this amazing step-by-step tutorial by TheCodingBug channel to build OpenCV from source. Before finding this video I struggled a lot to make it work.
I am just summarizing the steps below, for further reference check out this video linked above.

Prerequisites

You will need the following prerequisites for building OpenCV from scratch:

Anaconda installed
Visual Studio 2019 with Desktop development with C++ installed
CUDA and cuDNN installed
CMake GUI installed

Steps to build OpenCV

Download the code for the latest stable release from OpenCV github page. We will use the 4.5.1 version that can be downloaded here as a .zip file.

We will need some extra modules, that also have to be built from source, downloaded as .zip file from the opencv_contrib github repository. Please make sure you download the same version as you did for OpenCV. Now we will use 4.5.1 here too.

Create a new directory named opencv and extract both opencv and opencv_contrib here. Also, create a new empty folder called build in the same folder. This is how it will look like before start building OpenCV.

Open CMake GUI and set the correct paths as created above

When you first hit configure CMake will ask for the generator platform. Make sure you select x64 for a Windows 10 machine.

When the generation was finished, check that the Python 3 section is present in the console output, otherwise OpenCV will not work in Python.

If it is not present, it can have multiple reasons. I faced two. One: Anaconda is not registered in your system as the default Python interpreter. Solution: I had to reinstall Anaconda with ticking the correct checkbox during installation. Two: the numpy version does not match OpenCV version to be built. This is a little bit of magic for me why exactly the 1.19.5 numpy version had to be installed for OpenCV 4.5.1, but it worked after that.
Any change you make at this point, you have to restart the whole process until the Python 3 section appears. Make sure you press File / Delete Cache before start over.

You have to set the following Configuration options and hit Configure:

WITH_CUDA
OPENCV_DNN_CUDA
ENABLE_FAST_MATH
OPENCV_EXTRA_MODULES_PATH=<opencv-contrib directory path>/modules

While the configuration is happening find out the proper compute capability version for your GPU in this Wikipedia page. For RTX 2080 it was 7.5.
When the configuration is done set the following Configuration options and hit Configure again:

CUDA_FAST_MATH
CUDA_ARCH_BIN=<compute capability version, e.g. 7.5. Remove the rest.>

After the configuration is done and the console looks error free press the Generate button. It will generate the necessary files in the build folder to build OpenCV in Visual Studio.

After the generation is done open build/OpenCV.sln in Visual Studio.

First, change the solution configuration from Debug to Release in the menu bar. Then in the Solution Explorer, right click on CMakeTargets/ALL_BUILD and hit Build. It will take at least 30 mins to build the project. After the build is completed without errors, right click on CMakeTargets/INSTALL and hit Build.

The steps above will install OpenCV for the base conda environment. If you want to use a different environment, please make sure you change the following Configuration options before starting the whole process:

PYTHON3_EXECUTABLE
PYTHON3_INCLUDE_DIR
PYTHON3_LIBRARY
PYTHON3_NUMPY_INCLUDE_DIRS
PYTHON3_PACKAGES_PATH

Verify OpenCV was built successfully

To verify that OpenCV is usable with Python in your conda environment with GPU support, run the same commands as above in your environment. You should see that you have 1 CUDA enabled device available

Object detection with YOLOv4

Preparing the pre-trained model

Now comes the fun part the object detection itself.

Create a folder anywhere on your machine, let’s call it model. This folder will contain all files required for the object detection.

First, download coco.names from the darknet github page. This file contains the class names that YOLOv4 can detect. Copy this file into the model folder.
On the same github repository from the Pre-trained models section download yolov4.cfg and yolov4.weights files and copy them into the model folder.

Using a public webcam for detection

My original idea was to detect objects on live web camera streams as it is quite spectacular to demonstrate this on something which is happening right now, in real time.

All public web cameras publish their video in some kind of video stream format. The web cameras I was initially interested in used the M3U8 format, but there are other available formats as well.

Actually, most of the webcams I found use the Youtube Live streaming capabilities which requires a slightly different approach. I will cover that in a subsequent article.

Luckily OpenCV can easily deal with these formats.

If you got up to this point you may want to download the Python code for object detection from my github repository and start playing with it.
For trying it out without understanding the logic under the hood, all you need to do is provide four parameters:

the webcam stream url
an optional frame_size parameter in case the video resolution is too high for your display. Keep it as None if the original size fits your display.
the confidence threshold of the detection. It will show the bounding boxes for those objects where the confidence is higher than this threshold
the overlapping threshold which controls the behaviour of the overlapping bounding boxes

video_url = "https://cdn-004.whatsupcams.com/hls/hr_pula01.m3u8"
frame_width = 1200
confidence_threshold = 0.6
overlapping_threshold = 0.1if __name__ == '__main__':
    get_yolo_preds(net, video_url, confidence_threshold, overlapping_threshold, labels,frame_width)

That’s it! You can execute the Python script now and, if you followed the instructions above properly, a new windows should appear detecting objects on your favourite public webcam stream.
The one I was experimenting with is a public webcam in a Croatian city of Pula.

Detected objects by YOLOv4 on public webcam

As you see a couple of people were not detected by the model. If we want to refine the detection and make sure that the most people on the scene are successfully detected we need to customize the model and train our model for ourselves. This is what I will do in a next article.

Conclusion

Thanks for reading through my article. To summarize my experience with YOLOv4 so far: I found the pre-trained models quite inaccurate, with a lot of misclassified or non-detected objects on the screen. These are very easy to detect for humans but apparently not so easy for a general purpose, pre-trained model.
To achieve higher accuracy we have to train our own model on our own object classes. This is exactl ywhat I will do in my next article.