Sign in Get started

Notes on Reinforcement Learning w/ Policy Gradients

On CartPole with Distributions

While implementing CartPole I did go trough different solutions,

MountainCarContinous cheating

Mountain Car is one of my favorite problems, as it inter corporates seemingly contradictory actions to…

Reward function alternatives :: HER

Final environment of my benchmark, of classic OpenAI Gym 4 problems, is AcroBot :

Engineering behind RL

Simple description of several quirks what can be done to improve / customize…

RL thoughts

some stuffs i am thinking about, biased towards my (mis)understanding of RL

Reward Functions vs Q-Function overestimations :

does Q-Functions really overestimate ?

About Notes on Reinforcement Learning w/ Policy GradientsLatest StoriesArchiveAbout MediumTermsPrivacyTeams