Behavioral Cloning — Self-Driving Car Simulation

9 min readFeb 25, 2017

In this project we attempt to predict the steering angles for a self-driving car in a unity-based simulator. To acquire our data we must run the simulator and steer the car manually (keyboard + mouse or controller). To acquire data I used two different approaches. One data-set consisted of driving the car for four laps and trying to stay in the middle of the road the entire time. The other set is known as “recovery data” where the car started at the edge of the road and images were captured as it “recovered” or “steered” itself back to the middle of the road. The goal in this second set is to add error correction data: in the case that our car falls off the side of the road it will know what do to. The straight-driving set has 8k images and the recovery set has 1.3k images.

Data Acquisition:

Our data is saved in in this format:

── data
│ ├── driving_log.csv
│ └── IMG
│ ├── center_2017_01_06_13_54_47_000.jpg
│ ├── center_2017_01_06_13_54_47_110.jpg
│ ├── left_2017_02_14_21_46_00_967.jpg
│ ├── left_2017_02_14_21_46_01_041.jpg
│ ├── right_2016_12_01_13_42_06_070.jpg
│ ├── right_2016_12_01_13_42_06_172.jpg

From one lap of video capture there are thousands of these images. They are recorded from three different cameras in the car, a center, left, and right camera.

In order to make life easier, I used pandas’s DataFrames to sort and organize the image paths. I could not read each image file and store them locally in my program because I ran out of memory, so I worked with image paths. (more on this later in the generator section).

def createDataFrame(data_path):
    """
    input: data_path: path to data
    return: data frame
    """
    data_frame = pd.read_csv(data_path)
    data_frame.columns = ['center', 'left', 'right', 'steering', 'throttle', 'brake', 'speed']
    return data_frame

Pandas DataFrame for a data-set (last 10 rows). Note: Columns throttle, brake, and speed are cut off

The track that I used to gather training data is counter-clockwise and left-turns are more abundant than right turns. In order to generalize to any given track I need to create additional data.

Most of the acquired data had 0 degree steering angles because for most of the track we are driving straight. Starting out, our histogram of steering angles looks like this:

Histogram of steering angles before data creation / augmentation

Issue 1: As you can see, there are 25,000 images that have zero degree angles. The straight-driving data is 5 times more abundant than left-turn or right-turn data. Which means our model is likely to assume we are driving straight most of the time.

Issue 2: Another issue is that I only had about 8k images from the straight-driving set and 1.3k images from the recovery dataset. In total that is 9.3k images, which is not really enough to train my network.

Solution part 1: I separated the data-set into three different categories: center_turns (straight-driving zero angle), left_turns, and right_turns.

def createTrainingDataPathsCLR(df, prefix_path):
    """
    creates training data and training labels/ measurements from a data frame
    inputs:
    df: pandas DataFrame object
    prefix_path: path to the dataset
    output: (center_turns, left_turns, right_turns) list tuple
    """    
    # Turn types
    center_turns = []
    left_turns = []
    right_turns = []
    
    
    abs_path_to_IMG = os.path.abspath(prefix_path)
    for idx, row in df.iterrows():
        center_image_cam = os.path.join(abs_path_to_IMG, row['center'].strip())
        left_image_cam = os.path.join(abs_path_to_IMG, row['left'].strip())
        right_image_cam = os.path.join(abs_path_to_IMG, row['right'].strip())
        steering_angle = row['steering']
        
        # Right image condition
        if steering_angle > 0.125:
            right_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])
            
        # This is a left image
        elif steering_angle < -1 * 0.125:
            left_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])
            
        # This is a center image
        else:
        # center images
            center_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])
        
    return (center_turns, left_turns, right_turns)

Solution part 2: After separating the data I created additional data for all three categories, but not equally. Straight-driving data was so much more abundant than left or right turning data that I gave an extra large boost to the left turning data and a large boost to the right turning data. The goal is to have a normalized histogram.

def makeMore(dataArray, amount):
    """
    This function creates additional data for center, left, and right camera angles 
    given a dataArray of a specific turn type (center_turn, left_turn, or right_turn)
    input: 
    dataArray: array of specific turn type
    amount: amount to increase the input array
    output: dataArray of same turn type with values * amount
    """
    for i in range(len(dataArray)):
        for j in range(amount):                                    
            dataArray.append([dataArray[i][0], dataArray[i][1], dataArray[i][2], dataArray[i][3]])
    return dataArraycenter_turns = makeMore(center_turns, 5)
left_turns = makeMore(left_turns, 18)
right_turns = makeMore(right_turns, 12)

Now my steering angle histogram looks like this:

Center turns went from 6k to 30k for straight-driving dataset

Left turns went from 444 to 8k for straight-driving dataset

Right turns went from ~750 to ~9k for straight-driving dataset

After doing this to both data-sets I created a histogram that looks like this:

As you can see this looks more normalized than when we started

Preprocessing

Our data comes in as 160 x 320 x 3 RGB images. I used several preprocessing techniques to augment, transform, and create more data to give my network a better chance at generalizing to different track features.

Brightness Augmentation: This track has good lighting but I need to be able to drive under poor lighting conditions, shadows, lanes that have darker lines etc. To do this I created this function:

def change_brightness(image):
    """
    Augments the brightness of the image by multiplying the saturation by a uniform random variable
    input: image (RGB)
    output: image with brightness augmentation (RGB)
    """
    bright_factor = 0.2 + np.random.uniform()
    
    hsv_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    # perform brightness augmentation only on the second channel
    hsv_image[:,:,2] = hsv_image[:,:,2] * bright_factor
    
    # change back to RGB
    image_rgb = cv2.cvtColor(hsv_image, cv2.COLOR_HSV2RGB)
    return image_rgb

Flipping: Our data was acquired on a counter-clockwise track. In order to train our model to drive on a clockwise track I allowed a 50% chance vertically flip each image and invert the steering angle

def flip_image(image, ang):
    image = np.fliplr(image)
    ang = -1 * ang
    return (image, ang)

Cropping: We only care about the section of the road that has lane lines, so I cropped out the sky and the steering wheel. Given a 160 x 320 x 3 pixel image I cropped the height from 55 px to 140 px. Something like this will do

image = image[image.shape[0] * 0.35 : image.shape[0] * 0.875, :, :]

Resizing: I used Nvidia’s End to End Learning Network Architecture which uses 66 x 220 x 3 RGB images so I resized my images to this.

img = cv2.resize(image, (220, 66), interpolation=cv2.INTER_AREA)

Full Pre-processing Pipeline:

From input image to normalization in CNN

Generators:

So in this project we have over 80k images to work with. I simply cannot store all of that in memory. I used a generator to yield batches of images for my training pipeline. It looks like this:

def generate_training_data(data, batch_size = 32):
    """
    We create a loop through out data and 
    send out an individual row in the dataframe to preprocess_image_from_path, 
    which is then sent to preprocess_image
    inputs: 
    data: pandas DataFrame
    batch_size: batch sizes, size to make each batch
    returns a yield a batch of (image_batch, label_batch)
    """    
    image_batch = np.zeros((batch_size * 2, 66, 220, 3)) # nvidia input params
    label_batch = np.zeros((batch_size * 2))
    while True:
        for i in range(batch_size):
            idx = np.random.randint(len(data))
            row = data.iloc[[idx]].reset_index()
            x, y = preprocess_image_from_path(row['center'].values[0], row['steering'].values[0])
            
            # preprocess another center image
            x2, y2 = preprocess_image_from_path(row['center'].values[0], row['steering'].values[0])
            
            if np.random.randint(3) == 1:
                # 33% chance to overwrite center image (2) with left image + correction_factor
                x2, y2 = preprocess_image_from_path(row['left'].values[0], row['steering'] + 0.125)
                
            if np.random.randint(3) == 2:
                # 33% change to overwrite center image (2) give right image - correction_factor
                x2, y2 = preprocess_image_from_path(row['right'].values[0], row['steering'] - 0.125)
            
            
            image_batch[i] = x
            label_batch[i] = y
            
            image_batch[i + 1] = x2
            label_batch[i + 1] = y2
            
        yield shuffle(image_batch, label_batch)

In this generator I created batches of images. I chose to use batches so I would have control over what happens to each batch. In this case I doubled the given batch size and allotted a 33% to append the left camera or right camera images as well as the center image. In our simulation we take in the center image and predict the steering angle. I added a correction factor of 0.125 to the left camera images and subtracted that correction factor from the right camera images in order for them to be consistent with the steering angles of the center images. Now I just created 2x more training data on the fly.

I also created a validation generator that simply yields one image each time. These images only go through the cropping and resizing pre-processing steps, the same for as the test data in the simulation.

Network Architecture: I chose to use Nvidia’s End to End learning Convolutional Neural Network model for this project.

I chose to use ELU’s (Exponential Linear Units) instead of ReLu’s. ELU’s avoid the vanishing gradient problem (as does ReLu’s and leaky ReLu’s). They also speed up learning by avoiding a bias shift that ReLu’s are predisposed to. ELU’s are effective for very deep models, where the number there are more than four layers. These activation functions performed well on the ImageNet challenge in less epochs than a ReLu-based network of the same architecture. See more info here.

I implemented a Dropout layer to prevent over-fitting. I only implemented one Dropout layer, in the future I would add more dropout layers between the first 3 convolutions with probabilities ranging from 0.9 to 0.7 and then include the 0.5 probability after the fourth of fifth convolution layer.

I normalized the RGB pixel input values to range between [-1, 1]. I am sure [0, 1] would have worked as well

Training:

model = nvidia_model()
train_size = len(train_data.index)
for i in range(3):
    train_generator = generate_training_data(train_data, BATCH)
    history = model.fit_generator(
            train_generator, 
            samples_per_epoch = 20480, # try putting the whole thing in here in the future
            nb_epoch = 6,
            validation_data = valid_generator,
            nb_val_samples = val_size)
    print(history)
    
    model.save_weights('model-weights-F1.h5')
    model.save('model-F1.h5')

Samples per epoch: 20480 I am using about 80k images (I created about 4/5 of that). That is a ton of data. Therefore I don’t sample every single image on each epoch. I only sample 1/4 of that 80k => 20k images. However, on each of that 20k I am creating batches of 32 images, so I am actually training my model on 20480 * 32 = 655k images on each epoch. This is because I am randomly grabbing an image in my dataset to throw into the batches, with replacement. Total images processed is 20480 * 32 * 18 = 11.8M which is 11.8M * 66 * 220 * 3 = 513B pixels
Number of epochs: 18 I use 18 epochs, in 3 cycles. I did the three cycles because I want to be able to sequentially train long enough (nb_epochs = 6) and then be able to evaluate that model, if it is good I want to save it. Then I train another 2 models and choose the model with the lowest validation loss. When I tried setting the range to 4 I got a memory error. So 3 was the highest I could go.
Batch size: 16. Then I multiply that batch size * 2 in my generator so it becomes 32. I tried using a large batch size but I ran out of memory very fast because I was trying to store all those postprocessed images in memory inside the generator function. Lowering the batch size to 32 seemed to work well and it caused my training to speed up as well.
nb_val_samples: total length of validation data. My validation generator simply yields one image each time, so I simply use all the validation data for my validation sample size. This causes all my validation data to load into the generator one by one in a sequential fashion. They would load in order if I had not shuffled the training data before splitting it into training and validation datasets.

As you can see, cutting the epochs to 18 seemed to work pretty well

References:

ELU’s: https://arxiv.org/pdf/1511.07289v1.pdf
Nvidia model: https://arxiv.org/pdf/1604.07316v1.pdf
https://www.youtube.com/watch?v=rpxZ87YFg0M
http://selfdrivingcars.mit.edu/
http://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf
http://jacobgil.github.io/deeplearning/vehicle-steering-angle-visualizations
http://medium.com/udacity/teaching-a-machine-to-steer-a-car-d73217f2492c
http://chatbotslife.com/using-augmentation-to-mimic-human-driving-496b569760a9
Michael A. Nielsen, “Neural Networks and Deep Learning”, Determination Press, 2015

Behavioral Cloning — Self-Driving Car Simulation

Written by Jonathan Mitchell