You can find my code here. It might be more up to date than the article ;).
This is the fifth article in the self driving cars series. If you want to know why I’m sharing this and more about my journey, please read this.
For each image of the given video, we are going to identify cars and draw a bounding box around them.
Algorithm: Step by step
This algorithm is executed step by step on each frame:
1: HOG features extraction for learning
I first computed the HOG of many car and non car images.
HOG, or Histogram of Oriented Gradients is a technique that maps all the location of the gradients (change in color or luminosity) in a picture, as well as their orientation.
2: Spatial binning
We can perform spatial binning on an image and still retain enough information to help in finding vehicles.
This allows us to filter out some of the noise and only retain the most important features of the car, while making the algorithm run faster as the resulting images are smaller.
3: Color channel histograms
This will give us pixel intensity for each image, allowing us to use the common car colors as a feature for recognition.
4: Training a classifier
With these 3 features (HOG, spatial bins, color histograms) we train a classifier to separate cars from the rest.
I did the training using a pretty standard method:
- I first standardised the computed features using a standard scaler
- Then I split my data into training and test sets (80/20), randomising them in the process.
- I finally trained a standard SVC with default settings.
I also stored the results of the computed features, scaler and SVC to pickle files to avoid having to recompute them every time.
5: Final choice of parameters
Using the accuracy of the classifier, I tweaked the parameters to obtain a better result.
For the HOG, changing the number of orientations, of pixels per cell and cells per block never gave significantly better results. So I mainly tweaked the color space and channel used, in particular RGB (red or all channels), HSV (every combination) and YCrCb (all channels).
YCrCb on all channels gave the best results, with the original HOG settings: 8 orientations, 8*8 pixels per cell and 2*2 cells per block.
6: Sliding window
I then used a sliding window approach to search for cars in the image.
I restricted the windows to the lower half of the image. I also added different window sizes, ranging from 64*64 to 128*128.
Based on the size of the window, I also further restricted their location. For example, 64*64 windows are only computed on the top half of the lower half of the image as cars located on the lowest part of the image are supposed to be up close and thus appear bigger.
Finally, I made my window overlap a lot. We’ll see why in the next step.
The classifier not being 100% correct, we need to handle false positives.
I used a simple heatmap to do that. The overlapping windows allowed me to set a higher threshold on my heatmap which gave me significantly better results, detecting more true positives and less false positives.
By saving the detection results over the last five frames using a circular buffer and applying a threshold on the resulting heatmap, I get a result like this:
Here is the full video:
Obviously what we did here is naive implementation.
The pipeline is likely to fail if :
- Worse weather/lighting conditions.
- Less contrast between the cars and the road in general.
- Going up/down hills.
We used classical computer vision techniques and added a SVC classifier in this project. It’s really interesting to understand and get familiar with these concepts.
But again, I’d be very interested to see how deep learning would tackle this problem.
Next time, we’ll use Kalman filters to track the car in motion thanks to Radar and Lidar data! and all coded in C++ for real time execution!
Thanks to Eric Sauvegrain for his feedback on this post.