Machine Learning and Computer Vision. Vehicle Detection.

Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.
This article will provide details on Project #5 at Udacity’s program Self-Driving Car engineer. The goal is to locate and track cars driving in neighbor lanes.
As every ML project, we begin with defining training dataset of different car and non-car photos and use it to train the classifier. We have 8792 car photos and 8968 non-car photos(road, lanes, some road crap etc). Both datasets should have a similar number of elements to avoid biasing the prediction.

We will use OpenCV together with ML to extract features from cars. We will use color histograms, spatial histograms, and a histogram of oriented gradients. Below is a HOG example for a car and non-car images.
HOG takes several parameters. We have tried several parameters combination within different colorspaces. Based on Udacity’s recommendation we have chosen block size 8 px, 2 cells per block and gradient orientations equal to 8. Probably there is a better combination but we will leave it for the future research.

After processing all images using code snippet below, we have 14208 training features and 3552 test set features.

Next, let’s train the classifier. We will use Linear Support Vector Classifier with standard Support Vector Machine loss(hinge). Test accuracy is 99%. Pretty fine.

Next, let’s slice the image into overlapping rows of search boxes of different sizes and extract their features similar to extracting features from datasets above. We begin from 32x32 px boxes and up to 128x128. They overlap each other on 85%. Overlapping has been chosen from a series of experiments. Less overlapping — faster and less precise results, more overlapping — harder to solve, but more precise. So it depends on the computational speed how much we can improve the result here. Also more window sizes we have more precise the solution. Classifier tries to predict cars on a different distance from the car (smaller windows) and closer to the car — larger windows. As optimization, we can move smaller windows just on top of the cropped image, but probably this solution will not work when we have a high slope. So, for now, we will place windows on top of the image everywhere.

By using only 64x64 boxes we have next result.

We can improve this by adding different box sizes like described above.

Next, let’s mark each box with value 1 and calculate “heat map”. We also add a threshold to leave only parts where a number of overlapping boxes more than 10. To optimize the calculation and get rid of one-two frame labels that can appear stochastically across the image(classifier can suck too), we will add heats into a buffer for 7 frames and determine labels based on the mean value of the buffer.

The last step is to transform heat map into labels.


We can improve the result by augmenting the dataset and by implementing Neural Network instead of a simple classifier. Also, it’s a good idea to add color correction to the image like convolutional blur etc. We will do this next time.
All source code is available at our Github repo.
Thanks for reading!

