Train your dog using TF-Agents II: Revenge of Kiko

Kshitiz Rimal
Deep Learning Journal
7 min readJun 3, 2020

--

Feature thumbnail of the post containing, dog, cat, bone and robot icon and game screenshot

In my last post, I went through how we can create a custom maze adventure like environment using TF-Agents and how we can train, evaluate and visualize the performance of a DQN agent. In this post, I have added few more actions that our agent can perform and we will see how we can include that in the existing environment. If you are new to Reinforcement Learning and this post, I would like to suggest you to first read my previous blog on the same topic and make sure you read the articles listed on References as well to get more idea on Reinforcement Learning and Deep Reinforcement Learning.

We know that in our last game, Kiko, our player was able to sneak through some robots, collect some bones and quickly get out of the park. After getting out of that park, he shared his experience to one of his friend, whose name is Kay and Kay is a cat. Now, after listening to his experience Kay got furious and told him that he should not be afraid of any robot and he will teach him some sneak attack moves and to further support him Kay told him that he will go along with him to the same park and help him sneak through the robots, use those sneak attack moves that he mentioned and even collect some bones for him. Now, to make this plan work, let’s make our Kiko and Kay capable of executing those actions.

New Response

First thing, let’s add a new Action response to include this sneak attack move in our ActionResult class.

Now, let’s modify our game logic so that it can address our two players Kiko and Kay, and add some new conditions to so that they can execute some sneak attack moves on those robots.

Adding the new player to the game

Let’s modify our init and reset methods under game logic class and add two players instead of one. Here, we will address Kay, our new player with the integer value of 4 and our new player will be right next to our Kiko, that is on position 1. The __is_spot_last method remains the same as before.

Adding attack moves and making sure its for the right player

Now, let’s modify the move_dog method. In this method, we will add 2 new arguments and that is, a boolean argument to know if selected move is a sneak attack move and an integer argument to know which is the current player (between Kiko and Kay). Here, what we need to understand is that, in every action step, the agent can only perform 1 action, therefore it has to select which player to move and what that move will be.

In this move_dog method, we will first check if the selected new position is the last one and if that position is selected for Kay instead of Kiko, then we will throw Illegal move response as to complete the game it should be Kiko who should reach the ‘X’ mark not Kay as Kay is there just to help him. Next, we will check like before if the selected position is outside the boundary of the park (less than 0 or larger than 35th index) and we will also make sure that on that new selected position, Kiko or Kay should not be present.

The sneak attack move allows Kiko and Kay to move diagonally and it should only be used on robots (neither for movements, nor to collect the bones). Using this move they can move right, left diagonally on both top and bottom directions. Next, the game_state and game_ended methods remains the same as before.

The new Observation and Action Spec

Now, let’s modify our TF-Agents environment to include this new actions and observations to include a new player in the game.

We, begin by modifying init method and in that method we make sure that in observation spec, the values available are between 0 and 4, as 4 now denotes Kay and the action spec now is much larger compared to the earlier version as now it needs to include 4 new actions for attack and 8 new actions for movement and attack of the new player. Therefore, the action spec now is between 0 to 15, 0 to 7 denoting actions for Kiko and 8 to 15 for Kay. Where o to 3 is for movement and 4 to 7 is for attack moves for Kiko and similarly 8 to 11 is for movement and 9 to 15 is for attack moves for Kay. Action_spec, observation_spec and reset method remains the same as before.

Figuring out the right move and right player

Now, let’s look at the step method. In the step method, we modify few things, first we figure out if the selected action is an attack action or just a normal movement action and if it is for Kiko or it is for Kay. Then we pass these values to the move_dog method and get the response from the game logic. As compared to the previous step method, I have modified the reward values as well. They are modified so that the agent can learn the importance of each its action properly. The game complete reward remains the same as before with highest value of 10, as we need our main player to complete the game more than anything. Next, the sneak attack move has slightly more reward than the finding a bone as we also want our agents to learn to sneak attack properly and I have increase the penalty for illegal move actions slightly compared to the normal movement penalty, because we want to discourage our agents to make such actions and we also don’t want our players to place themselves on the position of any robot in the game, therefore penalty there is also increased more.

Creating the new Environment

Now, let’s create the environment and make sure to use the validate method so that what we modified in those methods works perfectly in each episode. And like before let’s make separate environments for training and evaluation.

The DQN Agent

Compared to the earlier version, the agent and the neural network used by the agent is also slightly different. Surprisingly, the QNetwork now has less number of hidden layers and fewer neurons. There is no direct explanation to this but while experimenting, neural network having more hidden layers and larger number of neurons was highly unstable and I had to limit the number of hidden layers and neurons. While experimenting the neural network having only 2 hidden layers and number of neurons between 32 and 64 performed better.

In our earlier version of the DQN agent, we used RMSProp as the optimizer, it turns out Adam optimizer performed better than RMSProp and I also had to decrease the value of the learning rate and the epsilon on this optimizer so as to make the training process more stable. The decay steps for the epsilon to control exploration are also higher compared to the last one. Training a DQN agent is hard task and while experimenting with lots of hyper-parameter values, these values worked well for this experiment and it is also possible that there are better hyper parameter values than this one that will result in better performance of the agent than the one I got.

Let us now setup Replay buffer, training metrics, driver and the dataset as before.

Now, let’s create the training loop and train our agent for 150,000 iterations.

Let’s look at the training curve

Average Return curve during the training vs Average Episode Length curve

As we can see, the training is still very unstable but we can see that the agent is gradually learning. Now, let’s evaluate our agent by visualizing it for 1 episode.

Screenshot containing 18 steps performed during 1 successful episode of the gameplay.

As we can see, the agent successfully learned to use both of our players for the gameplay and not only that it learned to use sneak attack, make legal moves and successfully complete the game with the final reward of 19.3 in only 18 steps.

I hope you enjoyed this followup post on creating your own adventure game like environment using TF-Agents and I would like to encourage create your own environment using TF-Agents and have fun with it. Please leave your comment or feedback if any, it will help me a lot to improve my future posts. Have fun playing with Deep Reinforcement Learning using TF-Agents!

Final notebook for this post can be found here:

References

  1. https://www.mikulskibartosz.name/categories#Reinforcement-learning
  2. https://github.com/ageron/handson-ml2/blob/master/18_reinforcement_learning.ipynb
  3. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, Aurelien Geron (Chapter 18: Reinforcement Learning)
  4. http://tensorflow.org/agents/
  5. https://www.youtube.com/watch?v=U7g7-Jzj9qo
  6. https://www.compart.com/en/unicode
    (Unicode Icons are used from here)

--

--

Kshitiz Rimal
Deep Learning Journal

AI Developer, Google Developers Expert (GDE) on ML, Intel AI Student Ambassador, Co-founder @ AI for Development: ainepal.org, City AI Ambassador: Kathmandu