This is a project for the Udacity Self Driving Car Nanodegree program. One of the projects there is to use machine learning to make a car mimic human driving without explicitly programming it to do any of the actions. Similar to how a new driver learns by watching an experienced driver and learning.
Udacity has provided a stunningly good simulator (similar to a video game). The simulated car is equipped with three cameras which face the front and are mounted on the left, center and the right of the simulated car. They also connect the simulator to a scripting backend so that we can retrieve critical values such as throttle, speed, steering angle, and the images from the simulator during training session. And during ‘autonomous’ mode, the scripting backend can publish a steering angle / throttle to the simulator given an image to allow the car to drive.
Now let’s jump in…
As with all machine learning problems, the predominant part of the problem turned out to be getting the right data. This project is extremely true to the phrase; “Garbage In, Garbage Out”.
My own data
Collecting data on my own was tortuously difficult considering that my choices were limited between a keyboard or a steam controller to control the simulator. I struggled fruitlessly for over a week to collect the data (although, I’ve to admit that playing with the simulator was a lot of fun to begin with). But this gave me a chance to understand just how important the process of collecting the data is. The problems with collecting data typically are that
- There would be too many entries with the angle set to 0. This is predominantly because of the keyboard entry. Note that in an actual car, this problem won’t be this acute (although it’ll still exist).
- Keyboard driving angles are highly quantized given us huge gradients whenever there is a big turn.
These are issues because, if we feed in a huge number of zeros to the ML system, it’ll have a tendency towards driving straight and will not be able to make sharp turns.
Finally came the savior.
Udacity released their training data on Track 1 couple of weeks back and this really jump started this project for me. As with all ML problems, the first step is to understand the data we are working with.
Angles over time
The first was to see how the angles vary over time
We can see that the data set collected some very sharp turns in the middle, but is otherwise pretty evenly distributed.
Now the important thing to find how the angles are distributed. We know that steering models, due to the large availability of zeros in the datasets typically tend to be biased towards predicting zeros.
And we can see that it holds good for this dataset as well. HUGE number of zeros and very very few of all other data. This is definitely going to bias our model towards predicting 0s, if we go ahead as it is.
Let’s also see, if we’ve big jumps in the angles over time (like it would if the data were to be recorded via keyboard). Note that recording from an actual steering wheel, we can expect the angle difference between subsequent samples to be quite uniform.
We can see that sometimes there are pretty large spikes in the difference. This points that the data has a good volatility in it and it’s going to be difficult for the model to get a smooth output from it.
Finally, let’s plot some sample images from the data for each angle bin. I’m just splitting all the angles into 21 bins of 0.1 size.
We can see that the some of the images, don’t really make too much sense. For eg:. the -0.8 and the -0.7 images taken at the bridge looks like the car is well centered, and yet we’ve a very large angle. So the dataset has some errors, and we need to think of a way to offset this error.
As with the other ML projects, along with some of the excellent discussions in the slack channel with other students, I came to the conclusion that data augmentation was the way to clean up the data. So the following are the augmentation that I do on the data set. We need to ensure that all of these augmentations are as random as possible to ensure that the model gets a good distribution of input training data.
These techniques will be extremely helpful and very much applicable when it’s real life driving as well.
Disclaimer: Most of what’s described below has been adapted from Vivek’s excellent post.
Randomly choosing between left, center or right image
One predominant problem in steering prediction is that even if the model predicts really really well, the model is bound to make a non zero error on it’s prediction. This means that over time the model will drift off from center. Since we are only training the model to drive correctly and not training the model how to recover (something that we humans know instinctively), the model is handicapped.
To offset this problem, we choose images randomly between left, right or center images by doing the following. The approach is to add / subtract a static offset from the angle when choosing the left / right image. To get a smoother drive, we can add / subtract an offset that’s weighted by the magnitude of the angle. But this biases the model towards zeros again and is a design tradeoff. I chose the simple static offsets. This simulates the model drifting off the center and trains the model to recover from it’s mistakes. This also gives us a lot more data to work with than by just using the center images alone.
img_choice = np.random.randint(3)if img_choice == 0:
img_path = os.path.join(PATH, df.left.iloc[idx].strip())
angle += OFF_CENTER_IMG
elif img_choice == 1:
img_path = os.path.join(PATH, df.center.iloc[idx].strip())
img_path = os.path.join(PATH, df.right.iloc[idx].strip())
angle -= OFF_CENTER_IMG
Flipping the image
One easy way to get more data is to just flip the image around the horizontal axis and flip the sign on the angle as well. We instantly get twice the data and we don’t inadvertently bias the model towards any one direction.
if np.random.randint(2) == 0:
img = np.fliplr(img)
new_angle = -new_angle
Changing brightness allows the model to become robust towards all lighting conditions. So we randomly change the brightness of the image.
temp = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)# Compute a random brightness value and apply to the image
brightness = BRIGHTNESS_RANGE + np.random.uniform()
temp[:, :, 2] = temp[:, :, 2] * brightness# Convert back to RGB and return
return cv2.cvtColor(temp, cv2.COLOR_HSV2RGB)
Changing X and Y translation
Here we shift the image in the X and Y direction to generate fake images. While it’s great that we get the left and right camera images to train recovery, it is still very less data. Shifting in the X axis and changing the angle correspondingly allows us to train even better recovery.
# Compute X translation
x_translation = (TRANS_X_RANGE * np.random.uniform()) - (TRANS_X_RANGE / 2)
new_angle = angle + ((x_translation / TRANS_X_RANGE) * 2) * TRANS_ANGLE# Randomly compute a Y translation
y_translation = (TRANS_Y_RANGE * np.random.uniform()) - (TRANS_Y_RANGE / 2)# Form the translation matrix
translation_matrix = np.float32([[1, 0, x_translation], [0, 1, y_translation]])# Translate the image
return cv2.warpAffine(img, translation_matrix, (img.shape, img.shape))
Biasing towards non-0 value
From the original dataset, we know that we need to ensure that the model does not get biased towards predicting a close to zero. To do this, we use a bias term as input to the data_generator. We tweak this bias term at the end of each epoch to decreasing the probability that a small angle will be selected for training. It goes like this
# Choose left / right / center image and compute new angle
# Do translation and modify the angle again# Define a random threshold for each image taken
threshold = np.random.uniform()# If the newly augmented angle + the bias falls below the threshold
# then discard this angle / img combination and look again
if (abs(angle) + bias) < threshold:
return None, None
What this will accomplish is that as we decreases bias from 1.0 to 0.0, it’ll with increasing probability drop all the lower angles. Note that we do not set a hard threshold for low angles, but allow it to drop off uniformly.
We also pre-process the data by removing the top 60 pixels (past the horizon) and the bottom 20 pixels (the hood of the car). And we resize the resulting image to a 64x64 (makes the CNN work much easier when it’s a square image).
# Remove the unwanted top scene and retain only the track
roi = img[60:140, :, :]# Resize the image
resize = cv2.resize(roi, (IMG_ROWS, IMG_COLS), interpolation=cv2.INTER_AREA)# Return the image sized as a 4D array
return np.resize(resize, (1, IMG_ROWS, IMG_COLS, IMG_CH))
Visualizing data after the augmentation
So let’s see what impact we’ve made on the data.
You can see that the when we allow a bias of 1.0, the data is nicely gaussian shaped, but as we reduce bias to 0.0, the 0 angle data is severely punished allowing our model to incrementally learn a more complex hypothesis. Compare this with our original histogram and we can see just how far we’ve already come.
We can also plot some of the images to see how the images look post augmentation. We can see that we’ve got a variety of images spanning a larger histogram of angles covering shifts, etc…
The model that I finally latched onto was a slightly simplified version of the VGG16 pre-trained model.
Total params: 17118541
Things to note are
- I’ve added a lambda layer on the top similar to the comma ai model to normalize the data on the fly
model.add(Lambda(lambda x: x/127.5 - .5,
input_shape=(IMG_ROWS, IMG_COLS, IMG_CH),
output_shape=(IMG_ROWS, IMG_COLS, IMG_CH)))
- I’ve added a color space conversion layer (credit to Vivek for the brilliant idea) as the first layer, so the model can automatically figure out the best color space for the hypothesis
model.add(Convolution2D(3, 1, 1, border_mode='same', name='color_conv'))
- The remaining convolutional layers are directly taken from VGG16
- I’ve added 5 fully connected layers in decreasing complexity as the classifier
- I load the pre-trained no-top weights for the VGG16.
- I ensure that the top 3 conv blocks from VGG16 are frozen and the other layers are trainable.
We take care of regularization in three ways
- We augment the data heavily. This more than anything else allows the model to generalize.
- We add dropouts in all the fully connected layers.
- We split the training set to training and cross validation to control the number of epochs in training. We split a small size (BATCH size) of data as validation data and use that to control if we are acutely overfitting the data
Overall, my impression in this project has been that considering the noise in the data and the complexity of the hypothesis, conventional methods of looking primarily at loss is ineffective at curbing overfitting. But I do end up trying the 3 epochs that scored the lowest validations score during training on the simulator first.
There is no reason for a test set, since we can test directly by allowing the car to drive.
- Generator: In this training, we use a generator, which randomly samples the set of available images (including left and right) from the CSV file, randomly flips it, randomly changes brightness, randomly shifts it in the X and Y direction (also changing the angle) and pre-processes the images and feeds to the model. This is great because keras ensures that while the GPUs are busy training the model, the CPU can in parallel generate the data using the generator.
- Optimizer: Adam with a learning rate of 1e-5. This was empirically noted as we are transfer learning (fine tuning an existing model). One thing I noted was, having a higher learning rate caused the system to regress to a single output value regardless of the image fed in.
- Bias: We start with a bias of 1.0 (allow all angles) and slowly as the epochs continue, reduce the bias, thereby dropping low angles progressively.
bias = 1. / (num_runs + 1.)
I trained for about 10 epochs, saving the weights after each epoch. I start testing the models starting with the ones having the least validation loss. My best solution will generally be in the top 3 models with the least validation loss. It takes about a minute to train each epoch on my GTX-1060. But arriving to this point took almost 3 weeks of hypothesis, trials and discussions.
During driving, one observation was that the model (because of my intent to smoothen the drive) does not take sharp curves at good speeds. I had two choices, one was to slow down during curves and the other was to multiply the predicted angle by a constant. Both approaches are used here to obtain a clean drive through Track 2. Track 1 was able to run without the above modifications
The model worked directly on track 1 with no changes.
Note that towards the end, where the two shadow lines (electrical wires) come, the car turns oddly. This suggests that the model is very afraid of shadows.
In lieu of Track 1 results, it was impossible to run the model on Track 2 with shadows enabled. So decreasing the quality to ‘fastest’, and increasing the throttle to 0.3 and multiplying the predicted steering angle by 1.4, we get
Considering the amount of turns, and steep inclines, the model does spectacularly well.
This project has been a phenomenal journey into the development methodology required for a successful deep learning project. The dependency on data collection, data augmentation, model parameters and the various other entities forces the developer to have a very streamlined process of development, without which the development will easily fall apart. And it instills a wonder and awe towards those working in this field. The complexities of transferring this learning (pun very much intended) to driving a car on the road is inspiring… Can’t wait for that day to come.
- Incorporate augmented shadows in the training to make it more robust (we need to make the model not afraid of the shadows under the bed)
- Look at the models used in the Challenge-2, especially optical flow and lstm based (I’ve always loved the idea of RNNs)
- Incorporate advanced lane findings (P4) and object detection (P5) into this project for a complete steering solution