Fan Lei
Fan Lei
Jul 25, 2017 · 1 min read

For : Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) — Q[s,a])

Can we use this shape:

Q[s,a] = (1-lr)*Q[s,a] + lr*(r+y*np.max(Q[s1,:]))

I think this will be easier to understand:

lr is learning rate, for example 10%

We can say: mix 90% old data with 10% new data. It’s very clear

    Fan Lei

    Written by

    Fan Lei

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade