While implementing CartPole I did go trough different solutions,
Mountain Car is one of my favorite problems, as it inter corporates seemingly contradictory actions to…
Final environment of my benchmark, of classic OpenAI Gym 4 problems, is AcroBot :
Reward Functions vs Q-Function overestimations :
does Q-Functions really overestimate ?