Cloning Driving Behavior with Keras and a Videogame-Like Simulator

One of the earlier projects of Udacity’s Self-Driving Car Engineer program was to build and train a convolutional neural network for end-to-end driving in a simulator. The trained network outputs a steering angle for an autonomous vehicle given a camera image as an input. This trained network was to be used and tested in a videogame-like driving simulator program.

1st Track



A trained network is only as good as data we feed into it.

Udacity provided the simulator software that can be used to collect driving behavior data. The same software will also be used to test how well the model is able is able to drive the car. The data to be collected are three images captured from three cameras installed at three different locations in the car [left, center, right] along with the corresponding steering measurement when the images were taken.

While I have played with the simulator, I have decided to use the data provided by Udacity instead, as I didn’t feel confident about my driving skills. This consists of 8,036 data points. I have used 85% of the samples to train, while the remaining 15% was used for validation. Plotting in the histogram shows that the steering measurements of about 4,000 of these samples are within the [-0.05, -0.05] range. Also, the data is biased towards left turns because of the track used to record this data.

To combat these biases I have done the following:

  1. For each data point, I randomly choose among the three camera positions [left, center, right], and employ a steering correction of 0.25. The value of steering correction was adjusted based on how training the network was behaving (to combat lane center steering bias)
  2. I also randomly flipped the image and changed the sign of the steering angle (to combat left steering bias)
  3. Doing those two things above increases the data set by a factor of six, as explained clearly by Navoshta


I have used a batch_size of 64 and an epoch of 10 is used for training the data. Having a higher batch size could make the server run out of memory. A higher number of epochs seems to not make a difference to the loss function output, with the risk of overfitting. A generator function was used to generate samples for each batch of data that would be fed when training and validating the network.

I used a generator so that we don’t need to store a lot of data unnecessarily and only use the memory that I need to use at a time. Notice that in my code — for simplicity — I don’t use the last batch of data which size is less than the batch_size I have chosen. This seems to be not a problem.


The table above summarizes the model architecture I have employed. This architecture is modeled after Comma.AI’s research. Before I decided to employ this architecture, I tried using the recommended NVIDIA model, but The AWS server (g2.2xLarge)used was running into out-of-memory errors at run-time.

As you can infer from the code above, the top 75 pixels and bottom 25 pixels are cropped out as these pixels have distracting much less useless information. We are not interested in the sky or the hood of the car. I have cropped within the model, as it is known that this function is relatively fast, because the model is parallelized on the GPU. The data is normalized as this is said to experimentally improve results. This is followed by three convolutions and two fully-connected layers. ELU()’s (Exponential Linear Units) are introduced as activates within network which are said to speed up learning and lead to higher accuracies. Subsampling was also used in each convolutional layer This architecture also employs aggressive dropout probabilities (0.2 and 0.5) to reduce overfitting. The model used an Adam optimizer, so the learning rate was not tuned manually. The loss function used for optimization is the mean square error function.


For this project I have:

  • Used a simulator to collect data that could be used as samples of driving behavior
  • Built a network in Keras that predicts steering angles from images
  • Trained and validated the model with training and validation set with the data provided by Udacity
  • Tested that the model could successfully drive around one track without leaving the road

For the first given track, no tire has left the drivable portion of the track surface. The model has not generalized for the second given track — at first it was aggressively swerving left to right and then it eventually crashed.

For further improvements, aside from using data with better data collection techniques; we can use more augmentation techniques at pre-existing data such as adding shadows, random brightness and contrasts, shearing, vertical shifting and horizontal shifting. Vertical shifting can simulate moving up and down slopes.

Here are links to nice discussions about data augmentation:


2nd Track