CarAI- A car that teaches itself how to drive

Joshua Levy
Unreal Ai
Published in
14 min readJun 16, 2018

Hey there! It’s been a while since our last post. Since our beginnings, we’ve begun having weekly meetings and are hoping to grow our community. The CarAI project has now come to a close and we set our sights on some new projects, namely GameOfLife. Before we get into what those other projects are, this blog post will discuss our first successful completion, CarAI.

The main motivation for this project came from this Youtube video:

I thought this video was really cool! Woah. The car, through generations of successive crashes, finally figured out how to traverse the constructed course. This was something I wanted to figure out how to implement.

So where was I going to get started? There were a few things that I was going to need to figure out.

  • The game engine; in the video above, the user used Unity engine to make their self-taught car
  • The environment, the car, the race course
  • Sensory inputs from the environment:
    This consisted of 3 rays that could each sense how close the walls were
  • Outputs that control the cars behavior:
    One would need to control acceleration/deceleration, the other controlling direction.
  • Some mechanism/model to process the inputs and be able to infer sensible driving decisions (outputs).
  • Be able to evolve and shape this mechanism by rewarding it when the car travels farther.

And that’s essentially all you need. The goal: Build a self-driving car that iteratively teaches itself how to travel farther along a track.

The video above gives you a good idea how to implement it, so I’ll reference it a lot, but I’m also going to show you my way, and tell a couple of history lessons along the way.

The Setup

I knew that the first thing that I wanted to do, and certainly the hardest thing for me to do would be to choose a reasonable game engine that can handle the game’s physics and render in 3D at a reasonable rate. Because I’m primarily a python programmer, I wanted the language to be accessible. While Unreal Engine and Unity Engine do have Python plugins, I ended up going with Panda3d.

Panda3d? What the hell!?!?

Remember ToonTown? Pirates of the Caribbean online? Me neither. Turns out it was owned by Disney before being sold to Carnegie Mellon University.

Yes.. This Disney..
A good home for toons…

With that being said, the panda3d environment is not half bad. It’s free, open-source, is used for research and the community likes to still actively develop projects. I used pip to grab the latest development version of panda3d and installed in an anaconda environment with python v3.6.

I checked out a couple of resources to get familiar with the engine, and looked at some tutorials to help set up my first environment, essentially a car on a blue plane (Panda3d uses a node structure to construct its environment).

Panda3d Node Structure
Nodes are connected in a “scene graph”

Here’s the final code for this project: https://github.com/jlevy44/UnrealAI/tree/master/CarAI/joshua_work/game/src, check out the old helper scripts if you want to know how I got started. I also checked out the panda3d manual and scoured their forums extensively.

The Car

My first goal was to get the car to be controlled by user input, which I was able to do so using one of the aforementioned helper scripts. The car was made from a model called yugoNP and followed the panda3d rendering pipeline.

Model development for panda3d
Picture of my car traversing the course (see track below). It is operated using a neural network (see upper right) and a genetic algorithm evolves the neural network such that the scores increase every generation (lower right)

The Track

This was one of the more challenging things to make. Essentially, followed this Blender tutorial to make my own track, and found a blender plugin that exports to either X format or Egg format and made the necessary conversions using command line tools supplied by panda3d. Remember to set object geometry so the car can collide with the track!

Track Made via Blender

The Sensory Inputs

The car below takes in three sensory inputs. Essentially I constructed 3 “rays” that shot out from the car in three directions, pictured below.

Early build of my car program… Driving was quite drunk to say the least… BAC > 0.08 BEWARE!

These rays collided with the walls of the track and their distance info to the car was sent back. This is known as “ray casting”, or colliding a ray “node”/object with the track “node”. These inputs were used as input to the “brain”.

The Brain

The Brain (upper right of picture below) essentially tells the car which actions to take. Much like our brain, it takes in sensory input information to make its choices. This is reflected by the three leftmost blue colored circles in the upper right picture. The top circle represents distance to the nearest wall from the left side, the middle is distance from the center, and the bottom is distance from the right; the color of the circle reflects how far the wall is from the car using that ray for distance detection.

Distance information found from ray casting is input to the car’s “brain”’s sensory inputs.
Brain: A Neural Network with: (yellow) input layer, a.k.a. sensing the environment; (purple) hidden layers a.k.a. processing the inputs deeply to come up with decisions, where most of the thinking is done; (red) output layers, where explicit decisions are made

The car’s brain is actually a neural network, which some of you may know by now, but others may be lost. Now there’s a tough mathematical description for this, which I will choose to ignore for now until I discuss evolving and teaching the brain/neural net. A neural network is broken into multiple layers (above). I’ll present high level details of the three main types of layersfor a deep artificial neural network:

  • input layer (yellow)- this directly hooks up with those ray casting sensors in the environment; in this case it is what the car “sees”
  • hidden layers (purple)- this does the thinking for the car; our physical brains contain a lot of neurons that are useful for making connections between that which we observe and that which we do. Likewise, these “neurons” make those connections and help us process events we perceive at a deeper level, like looking at how far walls are away and calculating which actions should be taken to avoid a crash. Note that each layer (column) within these hidden layers are comprised of a set of neurons
  • output layer (red)- this dictates the action taken by the brain of the car; the action the car will take after going through processing in the hidden layers. This can be slowing down, speeding up, or turning right or left.. The first circle’s color reflects slowing down or speeding up, while the second circle dictates turning direction.

Shape of the neural network:

  • The network above has the following shape/architecture: [3,5,4,5,2]
    3 input neurons(what it sees: left, straight and right collision distance)
    5 neurons in the first hidden layer (initial processing of inputs)
    4 neurons in the second hidden layer (deeper processing of inputs)
    5 neurons in the final hidden layer (deepest processing of inputs, preparation for decisions/outputs)
    2 output neurons (the main decisions made: speed and direction)
  • Note that as the complexity of the network increases (more hidden layers each with more neurons) the more of a potential for a “correct” decision can be formed; but such a brain would take longer to train, and of course, decisions may be processed a bit slower
Check out the car’s inputs and outputs… Note the colors changing in the left three nodes, indicating distance detection, while the right two nodes indicate the actions being taken… The “hidden” nodes seem to be patterned in their response… firing according to certain outputs.

The Outputs

The outputs controls how the car moves. Essentially, going back to the neural network diagram, if the top node/neuron of the output layer lights up, then the car accelerates, if it goes dark, it decelerates. Likewise, the lower node, the lighter it is, the more leftwards it will turn (up to 38 degrees, capped by yours truly), and as it darkens it begins to turn right.

Collision Detection & Some Rules About Dying

No one wants to die but that’s why I placed this car in the world. To die again and again until it does not want to die anymore. Then it truly transcends.

What? Okay. How does a car die?

  • When it hits a wall (I created a collision box around the car, yellow/white thin looking thing; if that collides with the wall, then the car is dead). Collision boxes are useful.. Der —
  • If it doesn’t move- a car that does not move is really annoying (and carries no significance but to waste our time); I really have a timer and if the car does not move a certain distance when the time expires then boom..
  • If it travels 10000m+, the track’s length is ~600m, so if it can do yay 20 laps, then in my books it has learned the course, so no need to keep it propagating
Death is but a constant in life… No matter how good you are.

The Math

Quick discussion on neural network weights:

Quick NN tutorial

A neural network is essentially a high-complexity non-linear model that can be modified to meet your needs… Bleh. Not a great description. How about a mathematical one.

Consider two layers of a neural network model (maybe the input and hidden layer):

Basic artificial neural network model

For those of you who know some math out there (linear algebra, etc.), this can get a little complex..

f(x) = tanh(x)
Mathematical description of the above network

Suppose vector x is our input (three nodes), the three distances. Our first goal is to calculate the hidden layer 1, vector y1 (next 4 neurons). To do this, we first construct a linear model Wx+b, in a vectorized form, much akin to the line of best fit equation mx+b1, where W(1) is a matrix of different weights, and stand for the 12 arrows in the above diagram, while b1 is the bias vector (introduces some constant of addition to the model, some offset). Then, to finalize y1, each element of the four-dimensional vector passes through an activation function, which is much akin to how the neuron fires and how well it fires given a stimuli. There are many different types, but in this case, the hyperbolic tangent is applied to each element of W(1)x+b1, yielding y_1. The same process is repeated to generate our final outputs, vector y_output, this time with y_1 as the input, and new weights matrix W(2)- shaped appropriately, and bias vector b2 (two dimensional vector) applied.

Simplified model of what just happened!

Essentially, we have constructed a model, call it M that is dependent on weights W(1) and W(2), which essentially weight how much we trust the input of the previous layer when making a calculation, and biases b1 and b2 (not pictured) make necessary adjustments, all with some nonlinear (hyperbolic tangent activation function) intermediaries. So we have neural network M(W(1),W(2),b1,b2). W(1),W(2),b1, and b2 are essentially responsible for the computations and how the brain is configured.

Let’s call the set {W(1),W(2),b1, b2} the genotype of the car. At an abstract level, we can call this the “DNA” of the car, the genetic code that when expressed, gives the car the ability to make different decisions. As we vary W(1),W(2),b1,b2, the car’s brain changes structure, emphasis, and makes different decisions in different circumstances. It evolves! (more to this later)

On a given run of the car, these values are fixed. Our goal is to vary the genotype, (the weights and biases of the neural network; of course there are different ways) until the genetic code it has allows it to exhibit superior performance in navigating a course!

The Algorithm

So on a given run (set weights; one selected brain), my algorithm goes something like this:

  1. Generate the brain of the set vehicle (we’ll call it an individual) by applying the weights to form a neural network model.
  2. Add the track to the environment.
  3. Add the car to the environment.
  4. Take in distances via sensors.
  5. Process distances via the car’s brain.
  6. Take action according to the neural network, which outputs two control mechanisms.
  7. If you don’t move, die, terminate run and save total distance traveled.
  8. If at max distance, 10000m, die.
  9. If you crash into the wall, die and save total distance traveled.
  10. As you move, repeat steps 5–8.

We see here that the run is dependent on what kind of brain the car has. The run also outputs the distance the car travels (a level of fitness, demonstrating how good that brain is). So, between runs, the the car’s brain must evolve so that it can travel farther distances.

Here’s the code behind it:

And there goes the code… bye bye!

Evolving the Brain via Machine Learning (Genetic Algorithm)

So now it’s time to put it all together. There are many ways to evolve a brain, or evolve a model, search a high dimensional parameter/hyperparameter space or whichever way you want to call it. Our goal is to evolve the decision making skills of the car. We are going to demonstrate one of many ways (yes, Q-learning and reinforcement learning would have been nice) to evolve a brain. We are going to use a genetic algorithm (GA; mini tutorial here).

What the hell is a genetic algorithm?

Well, genes.. algorithm… brain blast! We want to evolve a set of genes much like nature selects via natural selection. So let’s say that one single car, an individual, has a set of genes (call it a chromosome). These genes are a set of weights and biases that feed into our neural network model. So an individual represents one neural network. And you can have another neural network model if you change W and b, the weights. So two individuals:

M1

M2

Two models… two different ways of thinking about the world and the environment.

Now throw in more models, M1 M2 M3 M4 M5 M6 … each one will perform differently.

If you applied each of them to a car at a particular run, they would each return different distances traveled based on how each model gets the car to react.

So based on the cars genes, it will travel a different distance. M1–6 will return distances d1–6, which are called scores of Fitness. And the best brain has the highest fitness. The problem is that the genes, W and b, standing for W(1),W(2),b1,b2 can take on an infinite number of values and combinations. The combinatorial space is expansive. So how do we choose W and b such that it does not take until the end of time to find the best brain?

Genes are {W,b}, where W and b take on different values, and represent some high dimensional space. If our distance/fitness d depends on the brain model M, then it depends on W and b.

d(M) = d(W,b) = d(W(1),W(2),b1,b2) = d(….) = d(gene set)

We need to traverse this space to find the highest score/fitness.

To do this, the genetic algorithm first populates 60+ (to your specification) random number of individuals(gene sets, W,b), generation 1. For each individual, different neural network models M, it will run the simulator (see above algorithm) and return distance d.

Then, it evaluates generation 1. The simplified explanation is that it selects the top performing individuals (selection), the bad ones die off, it crosses over (crossover- interchanging some of the genes between the individuals, recombining genes between mom and pap for the new kids) some of them at a certain rate (crossover rate), and mutates others (changes some of W and b to new values randomly) at a certain rate (mutation rate). The next generation is populated with the selected individuals, some mutated individuals, and some crossed over individuals, and sometimes some altogether new individuals. Repeat the processes aforementioned for every generation until you converge on a solution. In this case our fitness function is the distance travelled, so run the GA until most of the remaining individuals for a generation are hitting 10000m travelled.

Genetic algorithm flow chart
Another flow diagram

There is a tradeoff here between exploration and exploitation. Exploration (via mutation) is exploring new gene sets out of a hope that something new can lead to promising results; especially if you are trapped in the same place for a while (like you can’t figure out how to get around a turn for the longest time; serendipitous discovery may be the kick to get you over that hump). Exploitation (via crossover) is taking what you learned and using that information, combining the best, to inform newer decision making processes, foundational knowledge. Typically in GAs, the mutation rate is at 10–20% while crossover is at 80–90%. The reason why? Crossover helps you converge on a solution, every time you mutate it takes longer to reach a solution (but it can help you find a better one).

In my algorithm, I used sklearn-deap to use a GA to evolve the neural network weights.

The final algorithm was this:

  1. Populate with individuals of one generation with different neural network models.
  2. Run the simulation on the cars with these brain models.
  3. Record the distances as levels of fitness.
  4. Evaluate fitness of generation and perform selection, mutation and crossover operations of neural network models.
  5. Repeat steps 1–4 until fitness is maximized/converged on maximum.

GAs are not perfect, but it ended up getting the job done. As you can see from the lower right panel in the below image, each individual/brain is represented by a datapoint, each generation is represented by a different color (later generations are in the right), and over time, the maximum fitness (y-axis) of 10000m is being reached by more individuals in a generation, making the genetic algorithm a smashing success!

Voila! It works!

Putting it all Together

Our goal was simple. Build a car that could teach itself how to drive. And we set about accomplishing that. We built a car and a track using Blender and Panda3d and equipped the car with sensors using ray casting. A neural network was used as the brain to process sensory inputs into tangible outputs/decisions (acceleration and turning). And these neural networks were evolved over time using genetic algorithms to find the car brain model that could make the best decisions and allow the car to travel the longest distance.

This project tool me about one and a half months and a whole lot of headaches, but at the end of the day, it was all worth it. If you see a project that interests you, I say go for it, because something that can be seemingly complex (a car teaching itself how to drive) can be broken down in to smaller tasks, and ultimately become attainable.

The final product! Mission accomplished! For now?

I hope you liked this blog post. There is much more I could say about the implementation and how I got here, but I’m glad you decided to strain your head and take a read on this afternoon, morning, late night. Never hesitate to reach out to me for additional questions or comment below.

My name is Joshua. I’m a part of a group called UnrealAI which tries to make all sorts of cool projects. We want you to join us. We’re open to all skill sets, so reach out, comment, and come have fun with us at our hackathons as we put together cool AI projects! Thanks for listening!

And come take a drive with us. A self-taught drive…

--

--