Implementing an Intelligent & Autonomous, Car-Driving Agent using Deep n-step Actor-Critic Algorithm
This is a condensed quick-start version of Chapter 8: Implementing an Intelligent & Autonomous, Car-Driving Agent using Deep n-step Actor-Critic Algorithm discussed in the Hands-on Intelligent agents with OpenAI Gym book. The book chapter teaches you the fundamentals of Policy Gradient based reinforcement learning algorithms and helps you intuitively understand the deep n-step advantage actor-critic algorithm. The chapter then continues with a guide to implement a super-intelligent agent that can drive a car autonomously in the Carla driving simulator using both the synchronous as well as asynchronous implementation of the deep n-step advantage actor-critic algorithm. Being a quick-start guide, this post first lists down the concepts covered and then dives straight into the code structure and elaborates on how you can train deep n-step advantage actor-critic agents in the Carla driving environment. The implementation is in PyTorch and all the necessary code and even trained agent brains are available in the book’s code repository. This post explains the code structure file-by-file with references to the python scripts in the code repository so that it is easy to follow. The outline of this post is as follows:
- Deep n-step Advantage Actor-Critic
- Training
- Testing
- Asynchronous Deep n-step Advantage Actor-Critic
- Training
- Testing

A sample screencapture showing 9 agents training asynchronously launched using the async_a2c_agent.py script with num_agents parameter in async_a2c_parameters.json set to 9. (Refer to Async A2C Training section for the command used to launch the training)
1. Brief chapter summary and outline of topics covered
This chapter teaches you the fundamentals of the Policy Gradient based reinforcement learning algorithms and helps you intuitively understand the deep n-step advantage actor-critic algorithm. You will then learn to implement a super-intelligent agent that can drive a car autonomously in the Carla simulator using both the synchronous as well as asynchronous implementation of the deep n-step advantage actor-critic algorithm.
Following is a higher-level outline of the topics covered in this chapter:
- Deep n-step Advantage Actor-Critic algorithm
- Policy Gradients
- The likelyhood ratio trick
- The policy gradient theorem
- Actor-Critic algorithms
- Advantage Actor-Critic algorithm
- n-step Advantage Actor-Critic algorithm
- n-step returns
- Implementing the n-step return calculation
- Implementing deep n-step Advantage Actor-Critic algorithm
- Training an intelligent and autonomous driving agent
- Training the agent to drive a car in the CARLA driving simulator
2. Code structure
Below is a brief description of what each script contains. Hopefully this makes it easy to follow the code.
Unfortunately, Medium doesn’t yet support indented lists. For a pretty looking code structure, please refer to https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/tree/master/ch8#2-code-structure
- a2c_agent.py → Main script to launch the deep n-step Advantage Actor-Critic (A2C) agent
- a2c_parameters.json → Configuration parameters for the a2c_agent and the environment
- async_a2c_agent.py → Main script to launch the deep n-step Asynchronous Advantage Actor-Critic (A3C) agent
- async_a2c_parameters.json → Configuration parameters for the async_a2c_agent.py and the environment
- batched_a2c_agent.py → Example script showing how agents can be run in parallel with batches of environments
- environment → Module containing environment implementations, wrapper and interfaces
- atari.py → Wrappers and env pre-processing functions for the Atari Gym environment
- carla_gym → OpenAI Gym compatible Carla driving environment module (see chapter 7 for more details about impl)
- envs → the Carla Gym environment
- carla → Refer to Chapter 7 for implementation details
- carla_env.py → Carla driving environment implementation
- scenarios.json → Carla environment configuration parameters to change the driving scenarios: map/city, weather conditions, route etc.
- utils.py → Utilities to vectorize and run environment instances in parallel as separate processes
- function_approximator → Module with neural network implementations
- deep.py → Deep neural network implementations in PyTorch for policy and value function approximation
- shallow.py → Shallow neural network implementations in PyTorch for policy and value function approximations
- logs → Folder to contain the Tensorboard log files for each run (or experiment)
- ENVIRONMENT_NAME_RUN_TIMESTAMP* → Folder created for each run based on environment name and the run timestamp
agent_params.json--> The parameters used by the agent corresponding to this run/experimentenv_params.json--> The environment configuration parameters used in this run/experimentevents.out.tfevents.*--> Tensorboard event log files- trained_models → Folder containing trained-models/”brains” for the agents
- README.md → Description of the trained agent “brains”/models with the naming conventions
- utils → Module containing utility functions to train/test the agent
- params_manager.py → A simple class to manage the agent’s and environment’s parameters
3. Running the code
- Deep n-step Advantage Actor-Critic Agent:
- The a2c_agent.py is the main script that takes care of the training and testing of the deep n-step advantage Actor-Critic agent.
The table below summarizes the argument that the script supports and what they mean. Note that, most of the agent and environment
related configuration parameters are in the a2c_parameters.json file and only those few parameters that are more useful
when launching the training/testing scripts are made available through the command line interface.

Deep n-step Advantage Actor-Critic Training
- Make sure the
rl_gym_bookconda environment with the necessary packages installed is activated. Assuming that you cloned
the code as per the instructions to~/HOIAWOG/, you can launch the Agent training script from the~/HOIAWOG/ch8directory using the following command: python a2c_agent.py --env Carla-v0 --gpu-id 0- If a saved agent “brain” (trained model) is available for the chosen environment, the training script will upload
that brain to the agent and continue training the agent to improve further. - The log files are written to the directory pointed with the
summary_file_path_prefixparameter (the default islogs/A2C_). When the training script is running, you can monitor the learning progress of the agent visually using Tensorboard. From the~/HOIAWOG/ch6directory, you can launch Tensorboard with the following command:tensorboard --log_dir=./logs/.
You can then visit the web URL printed on the console (the default one is: http://localhost:6006) to monitor the progress. - You can train the agent in any Gym compatible environment by providing the Gym env ID for the
--envargument.
Listed below is a short list of environments that you can train the agent in:

Deep n-step Advantage Actor-Critic Testing
- Make sure the
rl_gym_bookconda environment with the necessary packages installed is activated. Assuming that you cloned
the code as per the instructions to~/HOIAWOG/, you can launch the Agent testing script from the~/HOIAWOG/ch8directory using the following command: python a2c_agent.py --env Carla-v0 --test --render- The above command will launch the agent in testing mode by uploading the saved brain state (if available) for this environment
to the agent. The--testargument disables learning and simply evaluates the agent's performance in the chosen environment. - You can test the agent in any OpenAI Gym interface compatible learning environment like with the training procedure.
Listed below are some example environments from the list of environments for which trained brains/models are made available
in this repository:

Asynchronous n-step Advantage Actor-Critic Agent:
- The async_a2c_agent.py is the main script that takes care of the training and testing of the asynchronous deep n-step advantage Actor-Critic agent.
The table below summarizes the argument that the script supports and what they mean. Note that, most of the agent and environment
related configuration parameters are in the async_a2c_parameters.json file and only those few parameters that are more useful
when launching the training/testing scripts are made available through the command line interface.

Async n-step Advantage Actor-Critic Agent Training
- NOTE: Because this agent training script will spawn multiple agents and environment instances, make sure
you set thenum_agentsparameter in async_a2c_parameters.json file to sensible values based
on the hardware of the machine on which you are running this script. If you are using theCarla-v0environment to
train the agent in the Carla driving environment, be aware that the Carla server instance itself needs some GPU resource to run
on top of the agent's resource needs. - Make sure the
rl_gym_bookconda environment with the necessary packages installed is activated. Assuming that you cloned
the code as per the instructions to~/HOIAWOG/, you can launch the Agent training script from the~/HOIAWOG/ch8directory using the following command: python async_a2c_agent.py --env Carla-v0 --gpu-id 0- The screencapture animation (GIF) at the top of this page was captured by launching the above command with
num_agentsin async_a2c_parameters.json set to9. - If a saved agent “brain” (trained model) is available for the chosen environment, the training script will upload
that brain to the agent and continue training the agent to improve further. - The log files are written to the directory pointed with the
summary_file_path_prefixparameter (the default islogs/A2C_). When the training script is running, you can monitor the learning progress of the agent visually using Tensorboard. From the~/HOIAWOG/ch6directory, you can launch Tensorboard with the following command:tensorboard --log_dir=./logs/.
You can then visit the web URL printed on the console (the default one is: http://localhost:6006) to monitor the progress. - You can train the agent in any Gym compatible environment by providing the Gym env ID for the
--envargument.
Listed below are some short list of environments that you can train the agent in:

Async n-step Advantage Actor-Critic Agent Testing
- Make sure the
rl_gym_bookconda environment with the necessary packages installed is activated. Assuming that you cloned
the code as per the instructions to~/HOIAWOG/, you can launch the Agent testing script from the~/HOIAWOG/ch8directory using the following command: python async_a2c_agent.py --env Carla-v0 --test- The above command will launch the agent in testing mode by uploading the saved brain state (if available) for this environment
to the agent. The--testargument disables learning and simply evaluates the agent's performance in the chosen environment. - You can test the agent in any OpenAI Gym interface compatible learning environment like with the training procedure.
Listed below are some example environments from the list of environments for which trained brains/models are made available
in this repository:

You can refer the complete chapter 8 code for more information:
That concludes the post! If you have any questions or need clarification or help with a step, feel free to reach out using the comments section below.
Originally published at praveenp.com on September 8th, 2018

