Implementing an Intelligent & Autonomous, Car-Driving Agent using Deep n-step Actor-Critic Algorithm

Praveen Palanisamy
Sep 8, 2018 · 8 min read

This is a condensed quick-start version of Chapter 8: Implementing an Intelligent & Autonomous, Car-Driving Agent using Deep n-step Actor-Critic Algorithm discussed in the Hands-on Intelligent agents with OpenAI Gym book. The book chapter teaches you the fundamentals of Policy Gradient based reinforcement learning algorithms and helps you intuitively understand the deep n-step advantage actor-critic algorithm. The chapter then continues with a guide to implement a super-intelligent agent that can drive a car autonomously in the Carla driving simulator using both the synchronous as well as asynchronous implementation of the deep n-step advantage actor-critic algorithm. Being a quick-start guide, this post first lists down the concepts covered and then dives straight into the code structure and elaborates on how you can train deep n-step advantage actor-critic agents in the Carla driving environment. The implementation is in PyTorch and all the necessary code and even trained agent brains are available in the book’s code repository. This post explains the code structure file-by-file with references to the python scripts in the code repository so that it is easy to follow. The outline of this post is as follows:

  1. Brief chapter summary and outline of topics covered
  2. Code Structure
  3. Running the code
HOIAWOG A3C Carla 9
HOIAWOG A3C Carla 9

A sample screencapture showing 9 agents training asynchronously launched using the async_a2c_agent.py script with num_agents parameter in async_a2c_parameters.json set to 9. (Refer to Async A2C Training section for the command used to launch the training)

1. Brief chapter summary and outline of topics covered

This chapter teaches you the fundamentals of the Policy Gradient based reinforcement learning algorithms and helps you intuitively understand the deep n-step advantage actor-critic algorithm. You will then learn to implement a super-intelligent agent that can drive a car autonomously in the Carla simulator using both the synchronous as well as asynchronous implementation of the deep n-step advantage actor-critic algorithm.

Following is a higher-level outline of the topics covered in this chapter:

  • Deep n-step Advantage Actor-Critic algorithm
  • Policy Gradients
  • The likelyhood ratio trick
  • The policy gradient theorem
  • Actor-Critic algorithms
  • Advantage Actor-Critic algorithm
  • n-step Advantage Actor-Critic algorithm
  • n-step returns
  • Implementing the n-step return calculation
  • Implementing deep n-step Advantage Actor-Critic algorithm
  • Training an intelligent and autonomous driving agent
  • Training the agent to drive a car in the CARLA driving simulator

2. Code structure

Below is a brief description of what each script contains. Hopefully this makes it easy to follow the code.

Unfortunately, Medium doesn’t yet support indented lists. For a pretty looking code structure, please refer to https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/tree/master/ch8#2-code-structure

  • a2c_agent.py → Main script to launch the deep n-step Advantage Actor-Critic (A2C) agent
  • a2c_parameters.json → Configuration parameters for the a2c_agent and the environment
  • async_a2c_agent.py → Main script to launch the deep n-step Asynchronous Advantage Actor-Critic (A3C) agent
  • async_a2c_parameters.json → Configuration parameters for the async_a2c_agent.py and the environment
  • batched_a2c_agent.py → Example script showing how agents can be run in parallel with batches of environments
  • environment → Module containing environment implementations, wrapper and interfaces
  • atari.py → Wrappers and env pre-processing functions for the Atari Gym environment
  • carla_gym → OpenAI Gym compatible Carla driving environment module (see chapter 7 for more details about impl)
  • envs → the Carla Gym environment
  • carla → Refer to Chapter 7 for implementation details
  • carla_env.py → Carla driving environment implementation
  • scenarios.json → Carla environment configuration parameters to change the driving scenarios: map/city, weather conditions, route etc.
  • utils.py → Utilities to vectorize and run environment instances in parallel as separate processes
  • function_approximator → Module with neural network implementations
  • deep.py → Deep neural network implementations in PyTorch for policy and value function approximation
  • shallow.py → Shallow neural network implementations in PyTorch for policy and value function approximations
  • logs → Folder to contain the Tensorboard log files for each run (or experiment)
  • ENVIRONMENT_NAME_RUN_TIMESTAMP* → Folder created for each run based on environment name and the run timestamp
  • agent_params.json --> The parameters used by the agent corresponding to this run/experiment
  • env_params.json --> The environment configuration parameters used in this run/experiment
  • events.out.tfevents.* --> Tensorboard event log files
  • trained_models → Folder containing trained-models/”brains” for the agents
  • README.md → Description of the trained agent “brains”/models with the naming conventions
  • utils → Module containing utility functions to train/test the agent
  • params_manager.py → A simple class to manage the agent’s and environment’s parameters

3. Running the code

  • Deep n-step Advantage Actor-Critic Agent:
  • The a2c_agent.py is the main script that takes care of the training and testing of the deep n-step advantage Actor-Critic agent.
    The table below summarizes the argument that the script supports and what they mean. Note that, most of the agent and environment
    related configuration parameters are in the a2c_parameters.json file and only those few parameters that are more useful
    when launching the training/testing scripts are made available through the command line interface.
Command-line arguments supported by the a2c_agent.py script

Deep n-step Advantage Actor-Critic Training

  • Make sure the rl_gym_book conda environment with the necessary packages installed is activated. Assuming that you cloned
    the code as per the instructions to ~/HOIAWOG/, you can launch the Agent training script from the ~/HOIAWOG/ch8 directory using the following command:
  • python a2c_agent.py --env Carla-v0 --gpu-id 0
  • If a saved agent “brain” (trained model) is available for the chosen environment, the training script will upload
    that brain to the agent and continue training the agent to improve further.
  • The log files are written to the directory pointed with the summary_file_path_prefix parameter (the default is logs/A2C_). When the training script is running, you can monitor the learning progress of the agent visually using Tensorboard. From the ~/HOIAWOG/ch6 directory, you can launch Tensorboard with the following command: tensorboard --log_dir=./logs/.
    You can then visit the web URL printed on the console (the default one is: http://localhost:6006) to monitor the progress.
  • You can train the agent in any Gym compatible environment by providing the Gym env ID for the --env argument.
    Listed below is a short list of environments that you can train the agent in:
A sample list of learning environments and the associated command to train the a2c agent

Deep n-step Advantage Actor-Critic Testing

  • Make sure the rl_gym_book conda environment with the necessary packages installed is activated. Assuming that you cloned
    the code as per the instructions to ~/HOIAWOG/, you can launch the Agent testing script from the ~/HOIAWOG/ch8 directory using the following command:
  • python a2c_agent.py --env Carla-v0 --test --render
  • The above command will launch the agent in testing mode by uploading the saved brain state (if available) for this environment
    to the agent. The --test argument disables learning and simply evaluates the agent's performance in the chosen environment.
  • You can test the agent in any OpenAI Gym interface compatible learning environment like with the training procedure.
    Listed below are some example environments from the list of environments for which trained brains/models are made available
    in this repository:
A sample list of learning environments and the associated command to train the a2c agent

Asynchronous n-step Advantage Actor-Critic Agent:

  • The async_a2c_agent.py is the main script that takes care of the training and testing of the asynchronous deep n-step advantage Actor-Critic agent.
    The table below summarizes the argument that the script supports and what they mean. Note that, most of the agent and environment
    related configuration parameters are in the async_a2c_parameters.json file and only those few parameters that are more useful
    when launching the training/testing scripts are made available through the command line interface.
Command-line arguments supported by the deep asynchronous n-step advantage actor-critic agent and a brief description of the argument

Async n-step Advantage Actor-Critic Agent Training

  • NOTE: Because this agent training script will spawn multiple agents and environment instances, make sure
    you set the num_agents parameter in async_a2c_parameters.json file to sensible values based
    on the hardware of the machine on which you are running this script. If you are using the Carla-v0 environment to
    train the agent in the Carla driving environment, be aware that the Carla server instance itself needs some GPU resource to run
    on top of the agent's resource needs.
  • Make sure the rl_gym_book conda environment with the necessary packages installed is activated. Assuming that you cloned
    the code as per the instructions to ~/HOIAWOG/, you can launch the Agent training script from the ~/HOIAWOG/ch8 directory using the following command:
  • python async_a2c_agent.py --env Carla-v0 --gpu-id 0
  • The screencapture animation (GIF) at the top of this page was captured by launching the above command with num_agents in async_a2c_parameters.json set to 9.
  • If a saved agent “brain” (trained model) is available for the chosen environment, the training script will upload
    that brain to the agent and continue training the agent to improve further.
  • The log files are written to the directory pointed with the summary_file_path_prefix parameter (the default is logs/A2C_). When the training script is running, you can monitor the learning progress of the agent visually using Tensorboard. From the ~/HOIAWOG/ch6 directory, you can launch Tensorboard with the following command: tensorboard --log_dir=./logs/.
    You can then visit the web URL printed on the console (the default one is: http://localhost:6006) to monitor the progress.
  • You can train the agent in any Gym compatible environment by providing the Gym env ID for the --env argument.
    Listed below are some short list of environments that you can train the agent in:
Sample list of learning environments and the associated commands to train the deep Asynchronous n-step Advantage Actor-Critic Agent

Async n-step Advantage Actor-Critic Agent Testing

  • Make sure the rl_gym_book conda environment with the necessary packages installed is activated. Assuming that you cloned
    the code as per the instructions to ~/HOIAWOG/, you can launch the Agent testing script from the ~/HOIAWOG/ch8 directory using the following command:
  • python async_a2c_agent.py --env Carla-v0 --test
  • The above command will launch the agent in testing mode by uploading the saved brain state (if available) for this environment
    to the agent. The --test argument disables learning and simply evaluates the agent's performance in the chosen environment.
  • You can test the agent in any OpenAI Gym interface compatible learning environment like with the training procedure.
    Listed below are some example environments from the list of environments for which trained brains/models are made available
    in this repository:
Sample list of learning environments and the associated commands to train the deep asynchronous n-step advantage actor-critic agent

You can refer the complete chapter 8 code for more information:

That concludes the post! If you have any questions or need clarification or help with a step, feel free to reach out using the comments section below.

Originally published at praveenp.com on September 8th, 2018

HOIAWOG

Hands-on Intelligent agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning

Praveen Palanisamy

Written by

HOIAWOG

HOIAWOG

Hands-on Intelligent agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade