Udacity SDCND

The goal of the project was to find and highlight the road lane on the video recorded from the car. Once the lane was found there is additional requirement to calculate the curvature of the lane and position of the car relative to the lane. The project is a part of first term of Udacity’s Self-Driving Car Nanodegree. The whole pipeline was implemented in Python with the help of OpenCV library.

Before jumping to finding the lane itself, there is a bit of prep-work to be done. That includes:

1. Camera calibration, which would help us undistort the images for obtaining better accuracy.
2. Finding projective transform, which will be used to obtain top-view of the road, which makes lane detection much easier.
3. Estimating resolution, which would help us transform pixels into meters or feet. To do that, the standardized minimal width of lane of 12 feet (or 3.6576 meters) will be used

After finishing the prep-work the lanes would be identified on:

1. Single images
2. Complete video

If you are not interested in the whole pipeline for finding lanes, you can jump straight to the end of the article, to see video results.

The code can be found at this Github repo.

### Camera Calibration

Before starting the implementation of the lane detection pipeline, the first thing that should be done is camera calibration. That would help us:

• undistort the images coming from camera, thus improving the quality of geometrical measurement
• estimate the spatial resolution of pixels per meter in both x and y directions

The calibration process is straightforward, the images of the chessboard are sequentially loaded, converted to grayscale and chessboard pattern is looked. When the pattern is found, the positions of the corners get refined further to sub-pixel accuracy. That improves the accuracy of the calibration. The the corners of the chessboard pattern from all loaded images are used to compute the distortion coefficients and camera matrix.

It is important to check if the result is satisfactory, since calibration is a nonlinear numerical procedure it might yield suboptimal results. To do so, the calibration images are read once again and undistortion is applied to them. The undistorted images are shown in order to visually check if the distortion has been corrected. One sample of the input image, image with chessboard corners and the undistorted image is shown:

Before we move further on, lets just reflect on what the camera matrix is. The camera matrix encompasses the pinhole camera model in it. It gives the relationship between the coordinates of the points relative to the camera in 3D space and position of that point on the image in pixels. If X, Y and Z are coordinates of the point in 3D space, its position on image (u and v) in pixels is calculated using:

where M is camera matrix and s is scalar different from zero. This equation will be used later on.

### Finding projective transformation

Next step is to find the projective transform so the original images can be warped so that it looks like the camera is placed directly above the road. One approach is to hand tune the source and destination points, which are required to compute the transformation matrix. On the other hand, the script that does that for us can be created based on linear perspective geometry. Let’s look at the perspective geometry on the renaissance painting “Architectural Veduta” by Italian painter Francesco di Giorgio Martini. It is easy to note that all lines meet at a single point called vanishing point. The second thing to note is that the square floor tiles centred horizontally in the image, appear as trapezoids with horizontal top and bottom edges and side edges radiating from the vanishing point.

Our goal is to achieve exactly opposite, to transform a trapezoidal patch of the road in front of the car to a rectangular image of the road. To do so, the trapezoid needs to be defined as previously noted, horizontal top and bottom centred with respect to a vanishing point, sides radiating from the vanishing point. Of course, to define that, the first task is to find the vanishing point.

The vanishing point is the place where all parallel lines meet, so to find it we will be using images with straight lines. First, the images are undistorted, the Canny filter is applied and most prominent lines are identified using probabilistic Hough lines. These images illustrate how the pipeline works:

The vanishing point is at the intersection of all detected lines. Unfortunately, when more than two lines are present, the unique intersecting point might not exist. To overcome that the vanishing point is selected as the point whose total squared distance from all the lines is minimal, thus optimization procedure will be employed. Each line found by the Hough lines can be represented by the point on it pi and unit normal to it ni. Coordinate of the vanishing point is vp. So the total squared distance and a cost function to be minimized is:

To find the minimum the cost function is differentiated with respect to the vp. After some derivation the following is obtained:

Once the vanishing point is found, the top and bottom are defined manually and the trapezoid edges can be calculated. The corners of the trapezoid are used as a source points, while destination points are four corners of the new image. After that, the matrix that defines the hompography matrix of the perspective transform can be calculated using functions provided by the OpenCV. Images that illustrate the procedure follow. Please note that bottom points of the trapezoid are outside of the image, what is the reason for black triangles shown in the warped image.

The selected range is quite big, but that had to be done in order to be able to find the lanes on the all provided videos. In one of those, the bends are much sharper than on the highway and might easily veer outside of the trapezoid causing the whole pipeline to fail.

Once again, lets just reflect on what is the calculated homography matrix. It tells how the perspective transformation is going to be performed and where the pixel from original image with the coordinates (u, v) is going to move after the transformation. The destination of that pixel on the warped image would be the point with the coordinates (uw, vw). The new position is calculated using:

where H is homography matrix and s is scalar different from zero.

### Estimating pixel resolution

Next important step is to estimate resolution in pixels per meter of the warped image. It also can be done by hand, but as previously we’ll create a script that does that for us. In the course materials, it was stated that width of the lane is no less than 12 feet. In order to estimate the resolution in pixels per meter, the images with the straight lines will be used. They will be unwarped and the distance between the lines will be measured. If the multiple images are provided, the lowest distance will be assumed to be 12 feet or 3.6576 meters.

To start, the images with the straight lines would be unwarped and color would be converted to HLS space. To find the lanes the threshold would be applied to the luminescence component. Also, only some region of interest is taken into account. Since lines were straight heading towards the vanishing point, after the warping they will be vertical. The centroids of the blobs on left and right images would be calculated using image moments. Since the lane lines are vertical the width of the lane in pixels is the difference between the x coordinates of two centroids. That allows for the calculation of the width in pixels and then resolution in x direction. The images that illustrate the procedure are shown below.

That is how to find resolution in x direction, but for finding resolution in y there is no such trick. Since nothing was estimated manually neither this will be. The camera matrix obtained by calibrations holds some relative information about resolutions in x and y direction. We can try to exploit that. To find resolution in y direction we have to do some more math.

Lets say, we have a coordinate frame attached to the road, as shown on image below. It is easy to see that transformation of the coordinates represented in the coordinate frame of the road to the coordinate frame of the warped image, consists of scaling and shifted. Scale in x direction corresponds to the number of pixel per meter in x direction. Same holds for y. In mathematical form that can be written as:

The same thing can be calculated from the other side. Lets say that position and of the road coordinate frame in camera coordinate frame is given with rotation matrix R=[r1 r2 r3] and translation vector t. One important property that is going to be exploited is that matrix R is orthogonal, meaning that each of the rows r1, r2, r3 has the length of 1. Now since we know that, the pixel on the image that corresponds to the point with coordinates Xr, Yr and Zr is calculated by:

Since the road is planar, the Zr=0. Now we apply the perspective transform and get:

Since it has to hold for every point we can conclude that:

Where h1, h2, h3 are columns of matrix inverse(HM). Since the length of vectors r1 and r2 is one, we can calculate scalar s and finally ry:

The rather simple equation is obtained at the end, but it took us a while to get there.

### Finding lane lines

For easier explanation first, the functionality used for the single images will be explained. The pipeline for the video is almost the same, while some additional filtering is included.

### Single images

The pipeline for single images goes through several steps. Let us see how initial image looks:

#### 1. Image undistortion, warping and color space conversion.

The image gets undistorted first, then the perspective transformation is applied to the undistorted image. After that, the images is converted into HLS and LAB color space. L channels of both HLS and LAB channels will be used to track the bright regions of the image, while B channel is used to track the yellow lines on the image. Also, some filtering is performed to reduce the noise, median blur is used since it maintains the edges. Also for the use of the harder challenge video, the greenish areas are unmasked. The undistorted and warped images are:

#### 2. Finding bright or yellow areas

To find the bright or yellow areas, the morphological TOP HAT operation is used. It isolates the areas brighter than their surroundings. This operation is used in order to make pipeline robust against the lighting changes. In selected case, the lightness of the road surface changes, but we’ll see that it does not affect the TOP HAT morphological operation. The edge detection was not used since they are extremely affected by the noise on the image, which makes them unsuitable for harder challenges. After applying TOP HAT operation, the image is thresholded using adaptive threshold which adds a bit more to the overall robustness. The resulting images are:

Once the masks are calculated, logical or is applied between them in order to obtain the total mask. Since the threshold is kept quite low, there will be a lot of noise. To avoid noise interfering with lane finding procedure, the mask is eroded which removes all the regions smaller than the structuring element. The results are:

#### 4. Finding the line in a mask and fitting polynomial

When the mask is found, the search for the line begins. The initial point to start a search is somewhere 1.82 meters (6 feet). Under the assumption that lane is 12feet wide and that the car is in its middle, we would be spot on. Since that might not hold, the search is performed in its surroundings. The window at the bottom of the image with the highest number of points included found. After that, we go one layer up and perform the same search, but right now, we start from the maximum from the layer below. The search is performed until the top of the image is reached, gradually eliminating points outside of the maximal region. After the lane points have been isolated, the polynomial fit is performed. Also, some statistics are calculated which help determine if the found line is good or not. The statistics include:

1. Number of points in the fit
2. Quality of the fit calculated using covariance
4. How far lane is from the center of the car

All of these have to be above some empirically defined threshold. The maximal regions and points used for fitting are shown in the image below (red is for left, blue is for right line):

#### 5. Draw lane on original image and calculate curvature

If lanes are found, the curvature and position of the center of the car is calculated. Since the two lines are present, for the coefficient of the polynomial the mean value is used. The y coordinate at which the polynomials are evaluated is the bottom of the image. After that, the lines are drawn on a warped image and then unwarped and added to the original image. The last thing is to print out the values of curvature and offset of the center of the car. Here is the result for discussed case:

### Videos

For the videos, the pipeline follows the basic pipeline applied to single images. Additionally, because of the temporal dimension, some additional filtering is applied. Here is what is done:

1. The polynomial coefficients are averaged over last 7 iterations. That helps make the lane lines smoother and procedures more robust.
2. The lane has to be in close proximity to its previous position. That helps us narrow the search and avoid looking for windows with a maximum number of points in it.
3. The right lane has to be approximately 12ft (3.6m) right from the left lane.
4. Left and right lane should have similar curvature and angle

The pipeline is run on all three provided videos. It works great for flawlessly for two of those. For the third and the most challenging one, it works as well but loses the lane a few times. The resulting videos are provided:

### Discussion

The biggest issue by far for me were sudden changes of light conditions. In those cases, the lines get either completely lost (going from bright to dark) or image gets filled with noise coming from the white spots. Although I have done the best I can to make pipeline robust against that kind of changes, they still can cause major problems, which is evident from the last. More advanced filtering and brightness equalization techniques have to be examined.

The averaging out of polynomial coefficients over the last couple of iterations is inappropriate. Effects of it can be in harder challenge video where lane computed by the code lags behind the lane on the image, especially in the case of sharp bends. Some better filtering technique has to be applied.