Waymo Driving in the rain… a task usually poorly handled by autonomous cars

The future of self-driving cars

A crash course on learning-based driving and behavioral cloning

Ali J. Gangeh
The Startup
Published in
7 min readAug 30, 2019

--

Imagine what can be done in 30 billion hours?

Let me rephrase that what can be done in 1.2 billion days?

Even better 3.4 million years?

that’s the amount of time spent commuting per year in America alone.

This is just one of the many problems Self-driving cars are out to solve. In the past few years, there have been enormous advances in AI and machine learning and autonomous driving. Last year Waymo hit the 10 million mile mark… just last month (July 2019) Waymo hit the 10 BILLION mile mark. If that’s not exponential, I don’t know what is.

It’s kind of Ironic how a high schooler (who can’t drive) taught a car to drive by itself.

There are two main schools of thought driving the autonomous car revolution (no pun intended), rule-based and learning-based driving.

Rule-Based Driving

A rule-based model tells the car what to do… is that a stop sign? Stop. Is there a car ahead of you? Slow down. Are you in the middle of your lane? Good. You get the idea. Some subsystems, like the traffic sign classifiers, use Neural networks to “learn” what a stop sign is. But in the end the car is driving based on rules it is given.

This is the method which Waymo uses and they currently have the best results. They are the only company to reach Full Autonomy (meaning no driver behind the wheel is needed).

Learning-Based Driving (aka Behavioural Cloning)

Ruled-based approaches say that humans learn to drive by learning the rules of driving. This approach challenges that, saying humans learn to drive by watching other humans drive.

A learning-based model doesn’t explicitly tell the car to stop at a red light or drive through the middle of the lane. Instead, it teaches the car to replicate human driving which is called behavioral cloning. This is exactly what companies like Tesla and Comma.ai do.

This approach bypasses all the steps of a rule-based model. There are no separate systems for lane detection or traffic sign classification. Instead, there is one single system which raw inputs (footage of human driving) and gives an output (steering wheel angle, acceleration, etc.). This makes learning-based models an end to end system.

What is an End to End System?

An end to end system refers to a solution that directly connects input data to the output prediction. It bypasses the intermediate steps that usually occur in a traditional approach (sound familiar?)

The simplicity of an end to end model makes it:

  1. Cheaper to build
  2. Easier to build

Cheaper to build

LiDAR is a sensor used by multiple self-driving car companies. It’s similar to sonar except it uses light instead of sound. Without it, Waymo very well might not have full autonomy I mentioned. But just the short-range LiDAR units which Waymo use cost about $5,000 apiece!

Companies like Comma.ai do not use LiDAR, claiming humans don’t use LiDAR to drive so neither should cars. this cuts a huge cost off the car. Which allows them to sell a product that gives your car level 2-3 autonomy (meaning you still have to be alert and ready to take over) for under $1,000.

Easier to build

To build a rule-based car, tons of models need to be built and taught to interact with each other. It is very hard for a single person to make all these models and write all the rules required for a car to drive. A behavioral cloning model is comparatively much easier to build. Needing less development and training data.

Teaching a Virtual Car to drive

My need for speed skills finally come to use :)

Collecting Data

This is always the first step in behavioral cloning. For a computer to replicate human driving, it needs to know what human driving is. Though I can’t legally drive, I’ve been roaming the virtual roads since childhood. It had been a while but I put my need for speed skills to use and started driving around a track. I collected 30 minutes of data which consisted of 2 parts

  1. image taken 20x per second from 3 different cameras
  2. The corresponding steering angle at that time
left, center, and right camera angles respectively

Preprocessing

We now have the data but before we feed it to the model we need to remove the biases and optimize the images so the model can be successful.

First things first, the bias to driving straight needs to be removed. Though this is normal in driving, it causes the model to always drive straight. By deleting some of the frames where we are driving straight we leave a slight bias but allow the model to turn as well.

Second things second, we must generalize our model by adding random alterations to the images. A common obstacle when training a model is overfitting, meaning that the model can only recognize the images it was trained on… it can’t “generalize” to the real world. We can change that Through panning, zooming, altering brightness and flipping. That adds some punch and Pizzaz to the model letting it adapt to similar situations in the real world instead of overfitting on the training set.

Third things third, the image must be processed so it can be fed to the model. This includes:

  1. dividing the image by 255, this doesn’t change it physically but it reduces the pixel values from 0–1 instead of 0–255. This allows the Neural net to do its calculations.
  2. Blur the image, this removes noise which might throw the model off
  3. The color is changed to from RGB to YUV, Nvidia has shown that this achieves better results
  4. the image is cropped, which removes additional features irrelevant to driving like the trees, sky, and front portion of the car.

With all this said and done we are ready to get to the exciting stuff.

The Exciting Stuff (The Model)

Okay, I’m tired of saying “this Model” every time so from now and on we are going to call it Steve. Steve was made by the Nvidia drive team, Check out their *awesome* paper for more information. He has two sections, the Convolutional layers and the fully-connected layers.

  1. Convolutional layers: inputs images and finds important features like the curb, this is called feature extraction.
  2. Fully-connected layers: Matchs the various features with a steering angle which it outputs.

Before Steve starts to drive we need to train him. This is where our data comes in, the images are inputted and Steve guesses what the steering angle is (at first randomly). Then we calculate the error rate to see how far off Steve guessed. Steve then uses gradient descent to tweak his weights and biases. This repeats for each image and by the end, our model will be able to guess the steering angle based on any image given to it with minimal error.

Now Steve is ready to get 🚘.

…Several great libraries later we connect Steve with a driving simulator (I used this one). Using said libraries he intakes images from the simulator then uses his past training to guess the steering angle. This angle is sent to the virtual car allowing it to drive.

And here he is driving all by himself, they grow up so fast.

Vroom Vroom

Takeaways

The self-driving industry is still in its early stages. Waymo has reached full autonomy but we still have ways to go in making autonomous driving mainstream facing obstacles like pricing, legislature, and safety. Even though these are big challenges, some of the worlds smartest people are working on making autonomous driving a reality. The future is bright. Here are the key points of this article:

  • Autonomous cars are going to save billions of hours and millions of lives in the future
  • There are 2 types of autonomous driving, learning-based and rule-based.
  • An end to end system is one that directly connects input data to output prediction bypassing traditional steps
  • Behavioral cloning is a learning-based method which teaches a computer to replicate human driving.
  • Knowledge and tools are more accessible than ever. With enough passion, anybody can learn and contribute to an incredible future

And that’s how me, a 16-year-old, taught a car to drive.

This is a very high-level explanation for a more technical one check out my GitHub repository, where I go deep into the coding aspect. or message me on LinkedIn if you want to learn more about this project.

-Ali Out

--

--

Ali J. Gangeh
The Startup

16y/o innovator, intrested in space and our future in it.