Designing a Self-Driving car simulation using python

Rushikanjaria
Analytics Vidhya
Published in
6 min readJun 28, 2021

A self-driving car, also known as an autonomous vehicle, a driverless car or a robotic car which is capable of perceiving its surroundings and working without the need for human intervention.
A self-driving car is the most attention grabbing application of artificial intelligence and a classic example of Reinforcement Learning.
Reinforcement learning is an area of machine learning. It’s all about taking the right steps to get the most out of a given scenario. Various applications and robots use it to determine the best potential action or path in a given situation.

A very simple approach to visualize how a self-driving car works is Q-Learning. Q-learning is an off-policy reinforcement learning algorithm that attempts to determine the optimum course of action given the current situation. Q-learning aims to discover a policy that maximizes total reward. It’s termed off-policy since the q-learning function learns from activities that aren’t covered by the current policy, such as random acts, hence a policy isn’t required. ‘Q’ in q-learning refers to the ‘quality’, quality of action taken by the agent to gain future rewards.

Q-learning equation:
Q(s,a) ← Q(s,a) + α * [r + γ * maxa’Q(s’,a’) - Q(s,a)]

  • Q(s,a)= Q value of [state, action]
  • Q(s’,a’) = Q value of [new state, all the possible actions]
  • r = Reward
  • α = Learning Rate
  • γ = Discount Rate

This problem is a very good example of episodic reinforcement learning. Episodic tasks are the tasks that have a terminal state (end). In RL, episodes are considered agent-environment interactions from initial to final states. In our problem statement, car will start from the initial state and will explore the environment until it reaches final state. This is called an episode. Once the car reaches the final state, next episode will be started, and car will begin from the initial state.

To visualize the working of q-learning, I have used an environment Taxi-v3. Taxi-v3 is a 2-D environment of the OpenAI Gym library. Taxi-v3 is a best and simple example of self-driving car where I have applied reinforcement learning to train the taxi for taking optimal actions and gain future rewards.

Taxi-v3

Taxi-v3 has four designated locations in the grid world indicated by R(red), G(green), Y(yellow), and B(blue). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drives to the passenger’s location, picks up the passenger, drives to the passenger’s destination (another one of the four specified locations), and then drops off the passenger. Once the passenger is dropped off, the episode ends.
There are 500 discrete states since there are 25 taxi positions, 5 possible locations of the passenger (including the case when the passenger is in the taxi), and 4 destination locations.
There is a reward of -1 for each action and an additional reward of +20 for delivering the passenger. There is a reward of -10 for executing actions “pickup” and “dropoff” illegally.

To begin with the coding of self-driving car, you can use Google Colaboratory or Jupyter Notebook. Create a python notebook and you are ready to start coding.

Firstly, Install the gym environment library in your device or google colab notebook.

To install gym environment in jupyter notebook
To install gym environment in google colab notebook

Next step is to import all the required libraries - numpy, random, gym

To create our environment we will use gym.make(‘Taxi-v3") method. It will return the environment we need to train our self-driving car.

This is our environment, yellow entity is our car(taxi) which we will train using reinforcement learning.

Next step is one of the most important step, initializing hyperparameters.
In this step, we have to initialize the number of training and testing episodes, maximum steps car can taking in a single episode, learning rate, discount rate and exploration parameters.

What is exploration parameters?
These are the parameters which controls the exploration and exploitation rate of our model. At first our agent(car) will explore the environment thoroughly, it will gain some positive rewards as well as negative rewards and it will learn the environment determining the optimal paths. After exploration, car will start exploiting the environment, means it will test his learning from the exploration on the environment.

I have taken 50,000 Training episodes, 100 testing episodes, maximum steps car can take in a single episode is 99, 0.7 is our learning rate and 0.6 is our discount rate.
epsilon is our exploration rate which is 1.0, after each episode our epsilon value will be reduced so as after some point our model starts to exploit the environment.

Before jumping on to q-learning, we have to create a q-table. A q-table is a matrix that follows the shape of [state, action] and we initialize our values to zero. We then update and store our q-values after an episode. This q-table becomes a reference table for our agent to select the best action based on the q-value.

Now we will implement Q-learning on our environment. Breaking it down into steps, we get

  • Start exploring actions: For each state, select any one among all possible actions for the current state (s).
  • Travel to the next state (s’) as a result of that action (a).
  • For all possible actions from the state (s’) select the one with the highest Q-value.
  • Update Q-table values using the equation.
  • Set the next state as the current state.
  • If goal state is reached, then end and repeat the process.

After 50,000 episodes, our car is well trained and has learned to self-drive in that environment. With the help of updated q-table, car can take best actions to have maximum reward. We can visualize the working of self-driving car.

Rendering of this visualization is like:

  • blue: passenger
  • magenta: destination
  • yellow: empty taxi
  • green: full taxi
  • other letters (R, G, Y and B): locations for passengers and destinations

As you can see the using Reinforcement Learning, we can train a car to self drive in any environment. This tutorial was a basic and most simplest example of self-driving car and reinforcement learning.
This tutorial is just a simulation of self driving car in a deterministic and small environment. In real world, the environment is stochastic and huge.

You can get the entire code of this tutorial from my github repository.

Thank you. Hope you enjoyed reading this and learned something new.

--

--

Rushikanjaria
Analytics Vidhya

A Machine Learning enthusiast and passionate data scientist