How I Taught My Computer To Drive A Car Using Machine Learning

7 min readFeb 18, 2019

--

The concept of a self-driving car has fascinated me for as long as I can remember.

For the longest time, this idea seemed crazy. Only the smartest minds in academia and the most powerful automotive corporations had access to the needed information to pursue this idea.

Turns out, that’s not exactly true.

It wasn’t until recently that I realized that I could build one myself, with only my laptop’s mediocre CPU.

That kinda looks like my laptop. Never thought it could learn to drive.

Now, since I can’t even drive a car (I’m too young)— or build one — I had to go for the next best option. A computer simulation.

—

I know, it’s not exactly comparable to a real car, but the model essentially works the same way (albeit with fewer parameters).

My project was completed following Udacity’s self-driving car simulator, based on NVIDIA’s End to End Learning for Self-Driving Cars paper.

Might not be a real car but it can still drive on its own (I wouldn’t trust this particular one on the road yet)

The car was trained on a Convolutional Neural Network (CNN) and used machine learning, more specifically a concept called behavioural cloning to learn to drive.

How does it learn?

The CNN is able to drive by taking in a frame/image of the surroundings of the car as an input — using computer vision to read it — and predicting a steering angle as an output which tells the car in the simulator which direction it should go.

I’ll talk about it more — keep reading.

These are the steps I took to complete the project:

Generated my own behavioural driving data using Udacitiy’s self-driving car simulator
Wrote a Convolutional Neural Network (CNN) on Keras that predicts steering angle for the images collected when generating data
Tested whether the model can successfully drive using Udacity’s simulator

What is a Convolutional Neural Net?

Before I get started on the technical parts of this model, it’s important that you have a basic understanding of how a Convolutional Neural Network — CNN for short — works if you’d like to get a deeper understanding of how the model works.

CNN’s are a type of neural net that can identify features in an image, and they do it really well.

Unlike normal neural nets, CNN’s use filters to go through a small part of an image to find patterns within the pixels.

At the first convolutional layer, the filter looks for small differences or edges.

If you stack multiple convolutional layers together, the CNN can start finding features comprised of edges such as wheels or doors that make up a larger feature, such as a car.

A CNN that finds edges, then wheels and eventually recognizes what different car models look like to classify the given input as an Audi A7.

If you don’t understand how CNN’s work it’s fine, just make sure to remember that it’s a way that computers can recognize features within an image, like a human.

The Process Of Teaching The Model

Generating Data (lots of it)

Since this is a behavioural cloning project, the first step I took was to produce data of my driving behaviour that my model could actually clone.

I had to drive the car around the simulator for this part. Udacity’s simulator came with an inbuilt data recorder, which conveniently saved the data in a folder which stored each frame containing 3 images and a .csv file with steering angle data for the corresponding images.

The following data was recorded:

Three images that were taken every frame from three different angles from the head of the car:

A sample of the left, centre and right angle images of a frame used to train the model.

Along with the images, I also collected data on the steering wheel angle.

In the end, about 10,000 rows of data were recorded for each individual input.

To make the model more accurate in turns, I also recorded recovery data in which I took intentional sharp turns right before curves to distribute steering angle data.

Learning From The Data, By Copying Human Behaviour

After I’d recorded and stored the data, it was time for me to train the model using behavioural cloning, run on a CNN.

Behavioural Cloning: A method in which computers record and try to reproduce human subcognitive skills to learn a task by itself.

Like computers, babies fail quite a lot. It’s a learning process.

Think of how a baby learns to walk. It sees how other humans walk and tries to reproduce the same skill itself, testing this skill over and over until the baby can take their first steps.

A computer essentially does the same thing in behavioural cloning. In this project, it looks at an image, comes up with a steering angle and tries to tune itself best to come close to what the human steering angle was on the given image.

How does a computer learn to drive?

An illustration of the network, to make your life easier

The way the car learns to drive is surprisingly easy, let me explain it at a high level.

The green line is my steering angle, the red line is the steering angle computed by the CNN.

First, the images I recorded from driving around were fed into a CNN, which computed a steering angle for the specified image.

Then, this computed steering angle was subtracted from the actual desired steering angle. This number is called the error.

After that, a technique used to reduce error in neural networks called back propagation did just that, reduce the error. The weights of the CNN were adjusted to bring it’s output closer to the desired steering angle.

Now that the CNN had trained itself on the recorded data, it could actually predict steering angles accurately enough to make the car drive!

The Network Architecture

So far, you know that the CNN has been trained to take in an image input and make an accurate output, but I haven’t really told you about what goes on inside this network.

Let’s explore it.

The CNN has 9 layers:

One normalization layer to even out the data
5 convolutional layers for detecting lanes
3 connected layers to compute a steering angle

The first layer is an image normalization layer, which performs feature scaling to avoid saturation and make gradients converge faster (Cleans up data).

The next three layers are convolutional layers, which extract features from the images. The CNN was able to learn how to detect lanes from just the recorded human steering angle, even though I never explicitly labelled what a lane/road looks like.

The final 3 layers are fully connected layers, which starts with a layer with 100 neurons, then 50, 10 and finally 1 single output. This output is the computed steering angle, which is optimized through Adam algorithm and backpropagation (a method used to reduce error).

Now It’s Time To Run

Ahh…running the model.

This was the fun part

After finally training the model on my CPU (bad idea) — which took 24 painful hours — the car could now drive.

The car can drive without crashing! You probably don’t want to drive like that in real life though.

This was mind-blowing for me, I’d never thought of creating a self-driving car (yes, I know it’s a simulation) before learning to drive myself (cut me some slack, I’m only 15).

What’s even more mind-blowing is, that this whole network ran on only a couple lines of code.

Takeaways

CNN’s are a type of Neural Net that work really well in detecting features within an image. A CNN is used to detect lanes on its own in this self-driving car model, without being given any external knowledge on the appearance of lanes.
A computer learns to drive through a process called behavioural cloning, in which it tries to reproduce the correct steering angle for each frame recorded.
The steering angle is computed through a Neural Network, which has hidden layers that first learn what lanes are and then predict a steering angle.

Thank you for reading this article! If you’re interested in the code, check out my GitHub repository.

Before you leave

Clap 👏 if you enjoyed the article
Sign up for my monthly newsletter
Add me up on Linkedin if you want to stay updated
Follow me for more articles on Medium!