Vehicle Detection & Lane Finding using OpenCV & LeNet-5 (1/2)

Lane Finding

Shahzad Raza
7 min readSep 7, 2017

In this post, I’d like to explore improvements to my lane finding and vehicle detection algorithms. Both these activities were based off of projects 4 and 5 in Term 1 of Udacity’s Self-Driving Car Nanodegree program.

Lane Finding Pipeline

The lane finding pipeline consists of a series of steps starting with camera calibration. Camera calibration consists of using 15–20 images of a know pattern (usually a chessboard) to determine the lens distortion coefficients, responsible for radial & tangential distortion, and the camera matrix, containing the focal length and optical center values. The image below shows a distorted and undistorted image.

Comparison of a distorted and undistorted image

Once the distortion coefficients and camera matrix are known, the incoming frames can be processed as follows:

Lane Finding Pipeline

Threshold

My original lane finding pipeline contained a logical combination of various binaries including RGB white and yellow thresholds, Sobel x and y gradients, and thresholded H and S channels using an HSV colorspace. A much simpler approach was taken this time using only the LAB color space. The L-channel represents the luminosity of an image, the A-channel contains chromacity information along the red-green axis wheres the B-channel contains information along the blue-yellow axis. As a result, by thresholding the L and B-channels, the white and yellow pixels can very effectively be extracted from the image. No additional gradient or color thresholds are required.

In addition, the interact function of ipywidgets was used to determine the optimal threshold in an interactive way which greatly reduced the effort required for the thresholding task. The image below shows thresholded binaries for images from the project and challenge videos.

L, B & combined thresholded binaries for images

The following thresholds were used:

Test Video: L-channel = (215,255); B-channel = (150,255)

Challenge Video: L-channel = (205,255); B-channel = (135,255)

The L-channel thresholds used for the challenge video were aggressive to eliminate glare that was faced when exiting from under the bridge in the initial seconds of the video.

Warp & Convolve

A region of interest is then identified on the image to perform a perspective transform to obtain a plan view of the lane. This is shown below for an image from the test video for clarify. Note that this transform is initially applied to the thresholded binary.

Perspective Transform ROI

Once the transform is applied to the thresholded binary, the image is split into n horizontal slices and each column of the slice is summed to create a 1-D vector of values for each x-coordinate in the image (i.e. vector of length 1280 where the value of each position corresponds to the number of “on” pixels in the height of that slice). A rectangular window (vector of 1s) is then convolved with this vector to create a histogram of pixel density. Each of the histogram peaks are then appended to an array of centers corresponding to either the left or right lane lines depending on the x-coordinate of the peak. This process is repeated for each horizontal slice starting at the base of the image. The algorithm is summarized in the steps below (the relevant code can be found in the find_centers function with the lane_utils.py module in the project repo):

  1. Initiate an empty list of centers for the right line and left line. These arrays will hold the (x,y) co-ordinates for the centers.
  2. Perform a convolution with the entire length of the slice vector at the lowest slice to identify the base points of the left and right lane lines. In practice, the left half of the vector is convolved with the window to find the center points for the left lane line. The search location for the right lane line starts at 5/8*image width (evaluates to 800 pixels) to remove the effect of any lane markings that may appear in the center of the lane (e.g. the HOV lane marking in the challenge video)
  3. Append the peak values (x-coordinates) along with the corresponding y-coordinate (center of slice) to the respective centers’ arrays for the left and right lane lines only if a pixel density higher than a threshold is found.
  4. Move to the next slice.
  5. If a previously found peak exists in the previous slice, search a margin around that slice, otherwise search the entire slice.
  6. Repeat the process until all slices have been searched.

The image below shows an example of the histogram generated by convolving a rectangular window of width 45 pixels with the third slice from the bottom. The image is divided into 8 horizontal slices in this case demarcated by the dashed green lines. The red & blue dashed lines show the resulting polynomial fit following the convolution results with each window.

Histogram generated after

Fit & Check

Once the array of centers has been built, the coordinates are subject to two verification steps.

  1. A minimum of three coordinates must be found for both the left lane line and right lane line. The previous best fit is used if this isn’t the case (explained in the next section)
  2. The Z-score is then calculated for each x-coordinate in the centers array of a line to filter any outliers. Coordinates with a Z-score higher than a threshold are discarded

Once the outliers are removed, the remaining center points are used to fit a 2nd degree polynomial provided that at least three points are available in the centers array, otherwise a linear line is fit. The coefficients generated are then appended to a deque of coefficients of length 10.

Some additional sanity checks could be added to the pipeline to increase its robustness but weren’t considered since the performance was acceptable on the two videos. These include:

  • Verifying that the lane lines are equidistant at multiple points in the image
  • Compare the Radius of Curvature (ROC) of the lane line to it’s average ROC over the last few frames
  • Use the best fit lines from the previous frames to perform a search for lane pixels in a new frame instead of doing a blind search at the base of the image.

Draw & Unwarp

Finally, the original undistorted image is warped and the lines are plotted using the mean coefficients in the deque. The bounding region between the lines is also plotted. The image is then unwarped back to its original orientation.

Output image with lane lines identified

Processing Time

The lane finding algorithm was run on fairly light hardware with no GPU, therefore real-time detection was not expected. However, it was interesting to see the distribution of time for various aspects of the pipeline. For reference, the numbers below show the processing time for the test video (1261 frames) using a Dell Latitude 7450 with a 2.60 GHz dual-core i7–5600U CPU and 8GB RAM.

The average processing time is 0.10 sec./frame, the intial 45% of which is spent undistorting the incoming frame. The next 26.9% of the time is used in consumed by creating and warping the binary image for processing. The algorithm for analyzing the binary to find lane lines takes 4.9% of the total time and drawing the lines takes 21.9%. The remaining 1.3% of time is spent on operations that were not timed as part of this analysis.

Clearly, the undistortion is the most expensive step in the pipeline which can be traced back to how the cv2.undistort() function is implemented using cv2.initUndistortRectifyMap() and cv2.remap(). The joint undistortion and rectification transformation maps are created for each incoming image where every pixel in the distorted image is mapped to a position in the undistorted image. Some preliminary research shows that this step can be optimized but an improvement on this is currently outside the scope of this discussion and my knowledge. Suggestions on this are welcome!

The next part of this post will go over the vehicle detection portion of the pipeline using LeNet-5 and (very briefly) YOLOv2.

Part 2: Vehicle Detection

--

--