Aug 24, 2017 · 1 min read
Hey, thanks for the article, Matthew. It’s really helpful. I do have a couple of questions, though.
When you say “decay of exploration rate,” are you specifically talking about the 25 that shows up on line 118? Did you get the 25 from trial and error?
Also, in your optimization attempts you decided to use 1 bucket for x and x_theta, down from the original 3 each. That’s effectively the same thing as not considering those aspects of the state at all, right? If I’m reading the code correctly, state_to_bucket will always return zero for the first two indexes — the ones corresponding to x and x_theta — and my understanding is that x and x_theta are then ignored. Is that correct?
Thanks in advance for the clarifications.
[b]