It is an amazing post, thank you! I have one question. Under the AC_network class when you build the loss function, I see you combine the value function loss and policy loss with 0.5*value_loss + policy_loss. I am wondering is there any reason you do that? Actually I am quite confused about how to update the parameter of the policy and value network if they share the parameters.