
And with that we have a fully-functional reinforcement learning agent. Our agent is still far from the state of the art though. While we are using a neural network for the policy, the network still isn’t as deep or complex as …
…account, the form of Policy Gradient we used in the previous tutorials will need a few adjustments. The first of which is that we now need to update our agent with more than one experience at a time. To accomplish this, we will collect experiences in a buffer, and then occasionally use them to upda…