Jul 25, 2017 · 1 min read
For : Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) — Q[s,a])
Can we use this shape:
Q[s,a] = (1-lr)*Q[s,a] + lr*(r+y*np.max(Q[s1,:]))
I think this will be easier to understand:
lr is learning rate, for example 10%
We can say: mix 90% old data with 10% new data. It’s very clear