Hi, I tried modifying your implementation to make it work with pong, but for some reason it’s not…
Parth Sharma

Hi Parth,

There are a number of hyperparameters I would suggest adjusting to try getting it to learn Pong. In their DQN paper, DeepMind suggest a learning rate of 0.00025, an experience buffer size of 1 million, 50,000 random actions before network training begins, 1 million annealing steps, and a tau that is closer to 0.001 rather than 0.1. As you have it now, the agent doesn’t obtain a diverse enough set of experiences in order to begin to learn a robust policy. It also updates itself too strongly.

Hopefully those setting changes allow you to find more success!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.