Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks
Arthur Juliani


first congratulations for you awesome posts about RL in Tensorflow.

I was wondering one thing:

Cant we use the softmax function for Qout and nextQ and the cross-entropy loss?

Just a thing like that:

Qout = tf.nn.softmax(tf.matmul(inputs1,W))

nextQ = tf.nn.softmax(tf.placeholder(shape=[1,4],dtype=tf.float32))

loss = tf.reduce_sum(-tf.reduce_sum(nextQ * tf.log(Qout), 1))

I am saying just a stupid thing? Or you can balance the different utilities of actions in positions as a probabilities with the softmax?

Kind regards.

Show your support

Clapping shows how much you appreciated Guillermo González Sánchez’s story.