Simple Reinforcement Learning with Tensorflow: Part 3 - Model-Based RL
Arthur Juliani

Hello, Arthur. Great series, thanks for it.
I have a doubt regarding the actions. Why do you get them as 0s and 1s,and the the ‘y’ values are the opposite of it, and then again when you are training the Model you use np.abs(y — 1), but when you train the Policy, you use them directly.
Can you please clarify that point?

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.