DQN Algorithm: A father-son tale

Published in

Analytics Vidhya

3 min readMar 20, 2020

The Deep Q-Network (DQN) Reinforcement learning algorithm has a surprisingly simple and real life analogy with which it can be explained. It helps understand the sequence of operations involved by the algorithm.

The algorithm can be conceptually understood via the following simple narrative which illustrates the evolution of training:

We have a FATHER and a SON (=> 2 distinct neural networks)

0. The FATHER teaches the SON what he knows to be able to act independently in life. FATHER gives birth to SON.

1–2. The SON is found in current state and based on what he knows, he calculates Q-values for the actions he can take. I like to think that Q-values = Quality-values.

3–4. SON follows a policy and takes an action which brings him in next state.

Time for parental advice. The SON asks the FATHER what the value of his actions are.

5–6. The FATHER lets the SON know what the maximum Q-value of the state the son has been found is.

7. SON updates his weights/factors that affect him in his decisions based on FATHER input.

The epitomy of the training can be found in the following sentence the father tells to the son:

‘I cannot tell you my SON if you have been right or wrong. The only thing that i can tell you is that with the actions you have taken in the states you have been found which have led you to new states, my experience as a FATHER tells me that the best you can gain is <QMAX> for your actions now onwards’.

8. After some time (years = episodes played by son receiving advice from father) the SON becomes himself a FATHER and next generation training carries on.

The metaphor also supports the replay buffer notion in which most recent episodes are kept. This makes sense in an evolving system where knowledge is accumulated across generations and following generations need be updated and guided by most recent ones.

A pytorch based implementation of the algorithm as described above taking into account the notions of father, son, generation, etc can be found in : https://gist.github.com/igtzolas/8a8f3156153bcec85abde7544f2832da

The code has been heavily based on the following : https://github.com/seungeunrho/minimalRL/blob/master/dqn.py

Double DQN

Now that we have the metaphor to understand DQN we can easily update our metaphor to explain the idea behind Double DQN.

The epitomy of the doubld dqn training can be found in the following sentence the father tells to the son:

‘You are telling me my son that you think your next action should be NextAction based on what you think. I tell you that NextAction is worth that much if you take it now(in your next state). Please update your beliefs !’

In code the differences between DQN and Double DQN are :

diff DQN DoubleDQN

The implementation for Double DQN can be found in : https://gist.github.com/igtzolas/cd01b6e5a4df71bbff541f65e254f781

Thank you very much for reading !

DQN Algorithm: A father-son tale

8. After some time (years = episodes played by son receiving advice from father) the SON becomes himself a FATHER and next generation training carries on.

Double DQN

Written by John Tzolas