Note to My Past Self: Pro Tips for Term 1 of the Udacity Self-Driving Car Nanodegree

10 min readApr 12, 2017

If I could send myself a note back in time to 6 months ago, I would probably find something more valuable than sending myself mentorship tips for Term 1 of the Udacity Self-Driving Car Nanodegree. That being said, I would have wanted to know these points soon after being accepted into the selective inaugural cohort in October 2016. I have mentored over 40 students after having some success in the SDC Nanodegree myself, and this post will reveal the pointers that have been most relevant to my mentees. Let’s jump right in!

General Tips

If you are enrolled to get a job, don’t overlook the career-related resources such as Resume review and LinkedIn profile review. It will take networking and interview practice to land a job in the self-driving car industry, so start keeping up with industry news if you aren’t already. The list of registered autonomous vehicle testers in California is a relevant starting point to look for SDC companies that may have recent news and job postings.
All five projects provide opportunities to go above and beyond. Of course these are optional, but I do recommend additional challenges as a way to stand out relative to your peers. If you do have success going above and beyond, take the time to write a blog post about it so that you can more easily share your accomplishment.
If you are relatively new to programming with Python, this short Google course can help, especially for Projects 4 and 5 when it is helpful to implement Python Classes.

Project 1: Detect Lane Lines

In this project, you will detected highway lane lines on a video stream using OpenCV image analysis techniques such as Hough transforms and Canny edge detection. I love that this project provides an opportunity to add to your portfolio less than 2 weeks after starting the program. It provides a sense of “momentum” and a way to share the work with interested family and friends.

As an additional general tip, it is always a good idea to read the project assignment and rubric before reviewing the lessons. This helps keep your goal in mind as you learn the material.
If lane line extrapolation trips you up, take advantage of the multiple forum posts on this topic. One key aspect is filtering out Hough lines that have improbable slopes. A vertical or horizontal line in your image is highly unlikely to be a lane line!
As soon as you have a working algorithm, I recommend submitting and pushing on to the next project. There will be time to revisit if you prefer, and keep in mind that you will build on your lane detection knowledge in Project 4: Advanced Lane Detection.
If you do want to invest additional time and stand out with Project 1, this research paper has great ideas on how to detect curves and handle faded lines using an extended version of the Hough lanes algorithm.

Project 2: Traffic Sign Classification

In this project, you will implement and train a convolutional neural network with TensorFlow to accurately classify traffic signs. You experiment with different network architectures and perform image pre-processing.

Note that the general accuracy on this task is 98%, and it’s usually achieved with a convolutional neural network combined with data augmentation. Here are five tips for improving your traffic sign classification accuracy:

Augment the data through a combination of rotating, zooming, shearing, jittering, etc. This helps your model generalize to unseen images.
Experiment with different architectures (for example, try adding a maxpooling layer) or just change the dimensions of the existing layers. Keep in mind that arXiv.org is a great resource for peer-reviewed research!
Tune the hyperparameters (e.g. batch size, learning rate).
Add dropout layers.
Experiment with different color spaces.

You will probably notice that the classes are imbalanced. In practice, it is usually a good idea to balance your data relatively evenly across all classes. In this exercise, it is not as necessary, because the test data will have approximately the same ratio of labels as the training data. Also, note that normalization is not critical since all of your image features are already on a scale of 0–255.

Deep Learning

If you are relatively new to deep learning, Project 2 will be very informative. So much has happened in quite a short time with deep learning advancements, so here are a few comments and resources that should help give you some background:

A layered network first “learns” low-level features, such as basic shapes and edges, and then more complicated patterns as it recognizes and combines these features deeper in the network.
Here is a great “friendly introduction” to deep learning in this YouTube video, published by Udacity.
If you can explain the fundamentals of backpropagation, you will stand out in interviews.
This article has explanations and links to some of the most important deep learning papers over the last several years, and this paper describes key advances in CNN architecture evolution.
This introduction to AI provides good perspective.

At this point I want to highlight two fundamental concepts a bit further. I have received questions about these from my mentees on multiple occasions:

Splitting into train/test/validation data sets
In deep learning, you should split your data into training, validation, and testing data sets (for example using a 60–20–20 ratio) as opposed to only a training and testing set. The validation data is tested with each epoch and can still affect training, while the test data is completely unseen by the model until it is fully trained. You can still overfit on validation data but not test data, so the accuracy on the test data is the best indicator of performance. Here is a StackExchange link on test vs validation data (see the top-voted answer).

Utilizing dropout
Dropout is key in neural networks to prevent overfitting as it randomly removes X% of data at that particular layer. This helps the model generalize. It’s good to start with a lower dropout value such as 0.2 (20%) and work up to 0.5 if needed. Dropout has been shown to be successful when used prior to each layer in your model, so don’t worry about using it too often. The only potential harm in using dropout is underlearning by removing too much data, which should not be a concern if you stay below 50% and you have a large enough sample size in your project. Here is the link to the original paper introducing dropout.

Ok, back to the project-specific feedback.

Project 3: Behavioral Cloning

In this project, you will architect and train a convolutional neural network to drive a car in a simulator using Keras with TensorFlow backend. This project took more time than the others for me personally. There is extensive trial and error, demonstrating that deep learning is not (yet) an exact science. Working through this project teaches an invaluable lesson: gathering and preprocessing good data is often the most important and challenging aspect of a real-world deep learning problem.

Leverage the dataset provided by Udacity. I recommend using the simulator to record your own data at the beginning so that you can get a feel for the track and how the simulator works. However, the Udacity data does a nice job of including plenty of “recovery” data to train your vehicle how to get back to the middle of the lane.
Regarding model architecture, many students have success with using the NVIDIA model from their popular paper as a starting point. Some students have also used the open-source Comma AI model.
Use left, right and center image for training to increase your dataset size, with small offsets to the steering angle for the left and right images so that they emulate center images.
Crop out the top part of the image in your preprocessing. The background scene such as trees and mountains may distract the training from the important features.
Experiment with changing the color space. YUV has given promising results. You can also generalize to different levels of brightness through adjustments in YUV and HSV color space.
I recommend utilizing Keras ImageDataGenerator for two key reasons. The generator allows your model to only need to hold a subset of images in memory. Also, it facilitates image augmentation as images are fed into the model, which is key for model generalization.
Test vehicle performance with different numbers of epochs (generally between 5 to 20) and note that improved MSE does NOT necessarily mean better track performance on this project. MSE should be your loss metric since comparing steering angles is a regression problem as opposed to the classification problem in Project 2.
You can produce a visualization of your architecture using the model.summary() method of Keras or TensorBoard, a tool for visualizing your models.
Be careful with using Amazon EC2 on this project as there may be some latency issues with running the server on the remote instance and the simulator on your local machine. During this project I upgraded my GPU to NVIDIA GTX 1070 on my home machine (woo hoo!), and getting a fully autonomous simulated vehicle was more straightforward for me on local hardware than Amazon EC2.

If you have made it to Project 4 in the course, you are practically on the home stretch!

Project 4: Advanced Lane Detection

In this project, you will detect lane lines in a video stream in a variety of conditions, including changing road surfaces, curved roads, and variable lighting. You will use OpenCV image analysis techniques to implement camera calibration and perspective transforms, as well as color transforms, gradient thresholding, and polynomial fits.

While deep learning architecture in Projects 2 and 3 includes plenty of trial and error, Project 4 focuses more on logically crafting Python code. If you enjoy thinking through coding problems step by step, you might like this project!
Parameter selection on the gradient thresholding is one of the more challenging aspects. I recommend taking the time to experiment with different values until you get a good feel for how each parameter ultimately affects the result.
Use a Python Class to track your lane lines and average over multiple frames. This will make your lines much more consistent and professional-looking frame to frame. You can also use your Class to update your radius of curvature and lane position at certain intervals so that they are more legible in the video.
Implement “sanity checks” to throw out bogus lane lines! For example, I checked that the lines are 1) approximately parallel and 2) far enough apart in the perspective transform image. If either of these fail, you can throw out the results from that frame and default to the lines from the prior frame.

Project 5: Vehicle Tracking

In this project, you will track vehicles in camera images using a machine learning classifier with Histogram of Oriented Gradients (HOG) features.

PyImageSearch, which is a great resource for computer vision, has a couple of posts that are relevant to this project and are good supplements to the lesson material: HOG and Object Detection and Sliding Windows for Object Detection with Python and OpenCV.
Finding the right parameters for bounding box thresholding is key to avoid false positives. My approach first implemented a threshold for a certain number of overlapping windows within a single frame and then also a second threshold spanning a certain number of frames (using a Python Class).
Choose your sliding window space wisely. There is no rubric requirement for processing time, but processing the video will take longer than you want if, say, you need more than several seconds per frame. You can write your sliding window functions to search very specific regions for different window sizes, and this can get your processing time down to 1–2 seconds per frame. For example, you don’t need to search the sky for cars, and your smallest window size really only applies to cars that are far down the road.
Speaking of farther down the road, Haar Cascade Classifiers have been shown to be effective for cars that are off in the distance. For more information on this approach, you can check out this Github.
Speaking of processing time, many vehicle detection algorithms are written using computer vision techniques, but these will likely transition to deep learning due to the superior detection speed. Comparing a sliding window and a deep learning approach is a great way to stand out on this project. You could experiment with the YOLO (You Only Look Once) neural network or the more recent SSD (Single-Shot Detector).

Wrap-up

That’s it for now! Of course, do not take these as black and white Must Do’s. These are simply based on what I have seen to be useful in my own coursework and with my mentees. So, again playing the role of mentor, be sure to figure out what works best for you and best of luck in conquering your challenges.