How does an A3C work?

How to leverage your DQN-based reinforcement learning using multiple workers.

Reinforcement learning gained a lot of popularity after the historic victory of AlphaGo against the (human) Go champion and quite recently after OpenAI announced their StarCraft 2 testing environment in cooperation with Blizzard.

Most of these recent achievements were made possible thanks to an architecture named “Deep Q-Network”. This architecture achieves superhuman performances on Atari games by collecting memories while trying out different scenarios and recording the given rewards and then using these memories to train a neural network.

Asynchronous Advantage Actor-Critic

But what comes after? The same company who is responsible for the DQN, DeepMind, introduced the A3C architecture more recently. It’s supposed to be faster, simpler and more robust than the DQN and also able to achieve better results. But how?

Credit goes to Arthur Juliani

You can figure out the biggest difference by looking at the name of this mysterious architecture: Asynchronous Advantage Actor-Critic. In DQN, a single agent (or so-called worker) interacts with a single environment, generating training data. The A3C launches several workers asynchronously (as much as your CPU can handle) and lets them all interact with their own instance of the environment. They also train their own copy of the network and share their results at the end of the simulation.

The Advantage

So why exactly is this better than a traditional DQN? There are multiple reasons for that. First, by asynchronously launching more workers, you are essentially going to collect as much more training data, which makes the collection of the data faster.

Since every single worker instance also has their own environment, you are going to get more diverse data, which is known to make the network more robust and generates better results!

If you are interested in seeing an implementation of A3C, together with a much more in-depth explanation of the two remaining A’s of A3C, I’d strongly suggest taking a look at this Medium post by Arthur Juliani!

What comes next?

NVIDIA introduced the next generation of A3C, the GA3C, which also makes use of the GPU, in a paper earlier this year. Instead of copying the whole network to every single worker, the network here stays global on the GPU.

Due to the generated delay by copying data from the CPU to the GPU and back, this makes smaller networks about 6x faster. For larger networks, however, this method generates the same results about 45x faster, according to the paper.

Thanks for reading! Now you know what an A3C is! You also learned about its advantages against DQN and its next evolution: The GA3C. Follow me on Medium for more high-level explanations of your favorite nets and algorithms!