Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)
Arthur Juliani
3.5K80

It is an amazing post, thank you! I have one question. Under the AC_network class when you build the loss function, I see you combine the value function loss and policy loss with 0.5*value_loss + policy_loss. I am wondering is there any reason you do that? Actually I am quite confused about how to update the parameter of the policy and value network if they share the parameters.

Like what you read? Give Zhiang Zhang a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.