Simple Reinforcement Learning with Tensorflow Part 4: Deep Q-Networks and Beyond
Arthur Juliani
25430

Hi, Arthur,

Thank you so much for you deep learning series. It really helped me a lot!!

For better understanding, I used your code in this part, only changing tensorflow.contrib.slim to tensorflow.layers for tensorflow 1.0. The model printed a low average rewards in last ten episodes, like several iterations in the following.

494500 0.7 0.09999999999985551
495000 0.3 0.09999999999985551
495500 1.99840144433e-16 0.09999999999985551
496000 0.6 0.09999999999985551
496500 0.7 0.09999999999985551
497000 1.1 0.09999999999985551
497500 0.7 0.09999999999985551
498000 1.4 0.09999999999985551
498500 2.1 0.09999999999985551
499000 0.1 0.09999999999985551
499500 1.3 0.09999999999985551
500000 1.2 0.09999999999985551

Is the results seem right?

I also recorded QNetwork.Q , the maximum value of which keeps increasing to 4 in an increasing trend. I think this value may be OK, since it gets 1 point each time reaching the ‘goal’. But the average rewards in last ten episodes are supposed to be larger.

Could you give me some insight to the results?

Show your support

Clapping shows how much you appreciated Yujun Li’s story.