Tutorial: Making Road Traffic Counting App based on Computer Vision and OpenCV

Today we will learn how to count road traffic based on computer vision and without heavy deep learning algorithms. 
For this tutorial, we will use only Python and OpenCV with the pretty simple idea of motion detection with help of background subtraction algorithm.

All code you can find here

Here is our plan:

  1. Understand the main idea of background subtraction algorithms that used for foreground detection.
  2. OpenCV image filters.
  3. Object detection by contours.
  4. Building processing pipeline for further data manipulation.

And this is result:

Background subtraction algorithms

There are many different algorithms for background subtraction, but the main idea of them is very simple. 
Let’s assume that you have a video of your room, and on some of the frames of this video there is no humans & pets, so basically it’s static, let’s call it background_layer. So to get objects that are moving on the video we just need to:

foreground_objects = current_frame - background_layer

But in some cases, we cant get static frame because lighting can change, or some objects will be moved by someone, or always exist movement, etc. In such cases we are saving some number of frames and trying to figure out which of the pixels are the same for most of them, then this pixels becoming part of background_layer. Difference generally in how we get this background_layer and additional filtering that we use to make selection more accurate.

In this lesson, we will use MOG algorithm for background subtraction and after processing, it looks like this:

Original frame on the left, Subtracted foreground with MOG(with shadows detecting) on the right.

As you can see there is some noise on the foreground mask which we will try to remove with some standard filtering technic.

Right now our code looks like this:


For our case we will need this filters: Threshold, Erode, Dilate, Opening, Closing. Please go by links and read about each of them and look how they work (to not make copy/paste)

So now we will use them to remove some noise on foreground mask.
First, we will use Closing to remove gaps in areas, then Opening to remove 1–2 px points, and after that dilation to make object bolder.

And our foreground will look like this

Object detection by contours

For this purpose we will use the standard cv2.findContours method with params:

cv2.CV_RETR_EXTERNAL — get only outer contours.
cv2.CV_CHAIN_APPROX_TC89_L1 - use Teh-Chin chain approximation algorithm (faster)

On the exit, we add some filtering by height, width and add centroid.
Pretty simple, yeah?

Building processing pipeline

You must understand that in ML and CV there is no one magic algorithm that making altogether, even if we imagine that such algorithm exists, we still wouldn’t use it because it would be not effective at scale. For example a few years ago Netflix created competition with the prize 3 million dollars for the best movie recommendation algorithm. And one of the team created such, problem was that it just couldn’t work at scale and thus was useless for the company. But still, Netflix paid 1 million to them :)
So now we will build simple processing pipeline, it not for scale just for convenient but the idea the same.

As input constructor will take a list of processors that will be run in order. Each processor making part of the job. So let’s create contour detection processor.

So just merge together out bg subtraction, filtering and detection parts.
Now let’s create a processor that will link detected objects on different frames and will create paths, and also will count vehicles that got to the exit zone.

This class a bit complicated so let’s walk through it by parts.

This green mask on the image is exit zone, is where we counting our vehicles. For example, we will count only paths that have length more than 3 points(to remove some noise) and the 4th in the green zone.
We use masks cause it’s many operation effective and simpler than using vector algorithms. Just use “binary and” operation to check that point in the area, and that’s all. And here is how we set it:

Now let’s link points in paths

On first frame. we just add all points as new paths.

Next if len(path) == 1, for each path in the cache we are trying to find the point(centroid) from newly detected objects which will have the smallest Euclidean distance to the last point of the path.

If len(path) > 1, then with the last two points in the path we are predicting new point on the same line, and finding min distance between it and the current point.

The point with minimal distance added to the end of the current path and removed from the list.

If some points left after this we add them as new paths.

And also we limit the number of points in the path.

Now we will try to count vehicles that entering in the exit zone. To do this we just take 2 last points in the path and checking that last of them in exit zone, and previous not, and also checking that len(path) should be bigger than limit.

The part after else is preventing of back-linking new points to the points in exit zone.

And the last two processor is CSV writer to create report CSV file, and visualization for debugging and nice pictures.

CSV writer is saving data by time, cause we need it for further analytics. So i use this formula to add additional frame timing to the unixtimestamp:

time = ((self.start_time + int(frame_number / self.fps)) * 100 
+ int(100.0 / self.fps) * (frame_number % self.fps))

so with start time=1 000 000 000 and fps=10 i will get results like this
frame 1 = 1 000 000 000 010
frame 1 = 1 000 000 000 020

Then after you get full csv report you can aggregate this data as you want.

Full code of this project


So as you see it was not so hard as many people think.

But if you run the script you will see that this solution is not ideal, and having a problem with foreground objects overlapping, also it doesn’t have vehicles classification by types(that you will definitely need for real analytics). But still, with good camera position(above the road), it gives pretty good accuracy. And that tells us that even small & simple algorithms used in a right way can give good results.

So what we can do to fix current issues?

One way is to try adding some additional filtration trying to separate objects for better detection. Another is to use more complex algorithms like deep convolution networks (about which i will tell in the next article)

Next article