Behavioral Cloning — Self-Driving Car Simulation

Jonathan Mitchell
9 min readFeb 25, 2017

--

View on github, twitter, linkedin

In this project we attempt to predict the steering angles for a self-driving car in a unity-based simulator. To acquire our data we must run the simulator and steer the car manually (keyboard + mouse or controller). To acquire data I used two different approaches. One data-set consisted of driving the car for four laps and trying to stay in the middle of the road the entire time. The other set is known as “recovery data” where the car started at the edge of the road and images were captured as it “recovered” or “steered” itself back to the middle of the road. The goal in this second set is to add error correction data: in the case that our car falls off the side of the road it will know what do to. The straight-driving set has 8k images and the recovery set has 1.3k images.

Car track in simulation

Data Acquisition:

Our data is saved in in this format:

── data
│ ├── driving_log.csv
│ └── IMG
│ ├── center_2017_01_06_13_54_47_000.jpg
│ ├── center_2017_01_06_13_54_47_110.jpg
│ ├── left_2017_02_14_21_46_00_967.jpg
│ ├── left_2017_02_14_21_46_01_041.jpg
│ ├── right_2016_12_01_13_42_06_070.jpg
│ ├── right_2016_12_01_13_42_06_172.jpg

From one lap of video capture there are thousands of these images. They are recorded from three different cameras in the car, a center, left, and right camera.

In order to make life easier, I used pandas’s DataFrames to sort and organize the image paths. I could not read each image file and store them locally in my program because I ran out of memory, so I worked with image paths. (more on this later in the generator section).

def createDataFrame(data_path):
"""
input: data_path: path to data
return: data frame
"""
data_frame = pd.read_csv(data_path)
data_frame.columns = ['center', 'left', 'right', 'steering', 'throttle', 'brake', 'speed']
return data_frame
Pandas DataFrame for a data-set (last 10 rows). Note: Columns throttle, brake, and speed are cut off

The track that I used to gather training data is counter-clockwise and left-turns are more abundant than right turns. In order to generalize to any given track I need to create additional data.

Most of the acquired data had 0 degree steering angles because for most of the track we are driving straight. Starting out, our histogram of steering angles looks like this:

Histogram of steering angles before data creation / augmentation

Issue 1: As you can see, there are 25,000 images that have zero degree angles. The straight-driving data is 5 times more abundant than left-turn or right-turn data. Which means our model is likely to assume we are driving straight most of the time.

Issue 2: Another issue is that I only had about 8k images from the straight-driving set and 1.3k images from the recovery dataset. In total that is 9.3k images, which is not really enough to train my network.

Solution part 1: I separated the data-set into three different categories: center_turns (straight-driving zero angle), left_turns, and right_turns.

def createTrainingDataPathsCLR(df, prefix_path):
"""
creates training data and training labels/ measurements from a data frame
inputs:
df: pandas DataFrame object
prefix_path: path to the dataset
output: (center_turns, left_turns, right_turns) list tuple
"""
# Turn types
center_turns = []
left_turns = []
right_turns = []


abs_path_to_IMG = os.path.abspath(prefix_path)
for idx, row in df.iterrows():
center_image_cam = os.path.join(abs_path_to_IMG, row['center'].strip())
left_image_cam = os.path.join(abs_path_to_IMG, row['left'].strip())
right_image_cam = os.path.join(abs_path_to_IMG, row['right'].strip())
steering_angle = row['steering']

# Right image condition
if steering_angle > 0.125:
right_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])

# This is a left image
elif steering_angle < -1 * 0.125:
left_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])

# This is a center image
else:
# center images
center_turns.append([center_image_cam, left_image_cam, right_image_cam, steering_angle])

return (center_turns, left_turns, right_turns)

Solution part 2: After separating the data I created additional data for all three categories, but not equally. Straight-driving data was so much more abundant than left or right turning data that I gave an extra large boost to the left turning data and a large boost to the right turning data. The goal is to have a normalized histogram.

def makeMore(dataArray, amount):
"""
This function creates additional data for center, left, and right camera angles
given a dataArray of a specific turn type (center_turn, left_turn, or right_turn)
input:
dataArray: array of specific turn type
amount: amount to increase the input array
output: dataArray of same turn type with values * amount
"""
for i in range(len(dataArray)):
for j in range(amount):
dataArray.append([dataArray[i][0], dataArray[i][1], dataArray[i][2], dataArray[i][3]])
return dataArray
center_turns = makeMore(center_turns, 5)
left_turns = makeMore(left_turns, 18)
right_turns = makeMore(right_turns, 12)

Now my steering angle histogram looks like this:

Center turns went from 6k to 30k for straight-driving dataset
Left turns went from 444 to 8k for straight-driving dataset
Right turns went from ~750 to ~9k for straight-driving dataset

After doing this to both data-sets I created a histogram that looks like this:

As you can see this looks more normalized than when we started

Preprocessing

Our data comes in as 160 x 320 x 3 RGB images. I used several preprocessing techniques to augment, transform, and create more data to give my network a better chance at generalizing to different track features.

Brightness Augmentation: This track has good lighting but I need to be able to drive under poor lighting conditions, shadows, lanes that have darker lines etc. To do this I created this function:

def change_brightness(image):
"""
Augments the brightness of the image by multiplying the saturation by a uniform random variable
input: image (RGB)
output: image with brightness augmentation (RGB)
"""
bright_factor = 0.2 + np.random.uniform()

hsv_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
# perform brightness augmentation only on the second channel
hsv_image[:,:,2] = hsv_image[:,:,2] * bright_factor

# change back to RGB
image_rgb = cv2.cvtColor(hsv_image, cv2.COLOR_HSV2RGB)
return image_rgb

Flipping: Our data was acquired on a counter-clockwise track. In order to train our model to drive on a clockwise track I allowed a 50% chance vertically flip each image and invert the steering angle

def flip_image(image, ang):
image = np.fliplr(image)
ang = -1 * ang
return (image, ang)

Cropping: We only care about the section of the road that has lane lines, so I cropped out the sky and the steering wheel. Given a 160 x 320 x 3 pixel image I cropped the height from 55 px to 140 px. Something like this will do

image = image[image.shape[0] * 0.35 : image.shape[0] * 0.875, :, :]

Resizing: I used Nvidia’s End to End Learning Network Architecture which uses 66 x 220 x 3 RGB images so I resized my images to this.

img = cv2.resize(image, (220, 66), interpolation=cv2.INTER_AREA)

Full Pre-processing Pipeline:

From input image to normalization in CNN

Generators:

So in this project we have over 80k images to work with. I simply cannot store all of that in memory. I used a generator to yield batches of images for my training pipeline. It looks like this:

def generate_training_data(data, batch_size = 32):
"""
We create a loop through out data and
send out an individual row in the dataframe to preprocess_image_from_path,
which is then sent to preprocess_image
inputs:
data: pandas DataFrame
batch_size: batch sizes, size to make each batch
returns a yield a batch of (image_batch, label_batch)
"""
image_batch = np.zeros((batch_size * 2, 66, 220, 3)) # nvidia input params
label_batch = np.zeros((batch_size * 2))
while True:
for i in range(batch_size):
idx = np.random.randint(len(data))
row = data.iloc[[idx]].reset_index()
x, y = preprocess_image_from_path(row['center'].values[0], row['steering'].values[0])

# preprocess another center image
x2, y2 = preprocess_image_from_path(row['center'].values[0], row['steering'].values[0])

if np.random.randint(3) == 1:
# 33% chance to overwrite center image (2) with left image + correction_factor
x2, y2 = preprocess_image_from_path(row['left'].values[0], row['steering'] + 0.125)

if np.random.randint(3) == 2:
# 33% change to overwrite center image (2) give right image - correction_factor
x2, y2 = preprocess_image_from_path(row['right'].values[0], row['steering'] - 0.125)


image_batch[i] = x
label_batch[i] = y

image_batch[i + 1] = x2
label_batch[i + 1] = y2

yield shuffle(image_batch, label_batch)

In this generator I created batches of images. I chose to use batches so I would have control over what happens to each batch. In this case I doubled the given batch size and allotted a 33% to append the left camera or right camera images as well as the center image. In our simulation we take in the center image and predict the steering angle. I added a correction factor of 0.125 to the left camera images and subtracted that correction factor from the right camera images in order for them to be consistent with the steering angles of the center images. Now I just created 2x more training data on the fly.

I also created a validation generator that simply yields one image each time. These images only go through the cropping and resizing pre-processing steps, the same for as the test data in the simulation.

Network Architecture: I chose to use Nvidia’s End to End learning Convolutional Neural Network model for this project.

I chose to use ELU’s (Exponential Linear Units) instead of ReLu’s. ELU’s avoid the vanishing gradient problem (as does ReLu’s and leaky ReLu’s). They also speed up learning by avoiding a bias shift that ReLu’s are predisposed to. ELU’s are effective for very deep models, where the number there are more than four layers. These activation functions performed well on the ImageNet challenge in less epochs than a ReLu-based network of the same architecture. See more info here.

I implemented a Dropout layer to prevent over-fitting. I only implemented one Dropout layer, in the future I would add more dropout layers between the first 3 convolutions with probabilities ranging from 0.9 to 0.7 and then include the 0.5 probability after the fourth of fifth convolution layer.

I normalized the RGB pixel input values to range between [-1, 1]. I am sure [0, 1] would have worked as well

Training:

model = nvidia_model()
train_size = len(train_data.index)
for i in range(3):
train_generator = generate_training_data(train_data, BATCH)
history = model.fit_generator(
train_generator,
samples_per_epoch = 20480, # try putting the whole thing in here in the future
nb_epoch = 6,
validation_data = valid_generator,
nb_val_samples = val_size)
print(history)

model.save_weights('model-weights-F1.h5')
model.save('model-F1.h5')
  • Samples per epoch: 20480 I am using about 80k images (I created about 4/5 of that). That is a ton of data. Therefore I don’t sample every single image on each epoch. I only sample 1/4 of that 80k => 20k images. However, on each of that 20k I am creating batches of 32 images, so I am actually training my model on 20480 * 32 = 655k images on each epoch. This is because I am randomly grabbing an image in my dataset to throw into the batches, with replacement. Total images processed is 20480 * 32 * 18 = 11.8M which is 11.8M * 66 * 220 * 3 = 513B pixels
  • Number of epochs: 18 I use 18 epochs, in 3 cycles. I did the three cycles because I want to be able to sequentially train long enough (nb_epochs = 6) and then be able to evaluate that model, if it is good I want to save it. Then I train another 2 models and choose the model with the lowest validation loss. When I tried setting the range to 4 I got a memory error. So 3 was the highest I could go.
  • Batch size: 16. Then I multiply that batch size * 2 in my generator so it becomes 32. I tried using a large batch size but I ran out of memory very fast because I was trying to store all those postprocessed images in memory inside the generator function. Lowering the batch size to 32 seemed to work well and it caused my training to speed up as well.
  • nb_val_samples: total length of validation data. My validation generator simply yields one image each time, so I simply use all the validation data for my validation sample size. This causes all my validation data to load into the generator one by one in a sequential fashion. They would load in order if I had not shuffled the training data before splitting it into training and validation datasets.
As you can see, cutting the epochs to 18 seemed to work pretty well

References:

--

--

Jonathan Mitchell

Electrical and Software Engineer. Full stack JavaScript, Deep Learning/ Computer Vision. Student. Hacker. Warrior.