Mounsif Mehdi
1 min readFeb 1, 2018


Hi Arthur, and thanks for your shiny article ! It does shed some lights on my unfortunates adventures with Actor-Critic.

I have a question concerning my setting: I’m trying to have a two-joints robot touch a target. The target is fixed and can be (if decided by user) randomly moved between each episodes. I had decent results using REINFORCE with baseline, and so I decided to move on toward Actor-Critic. To my huge surprise, it is very bad. The policy doesn’t converge (even when the target is not moved between episodes), takes a lot of time to find the target, and when it does so, can’t get even close to 100% touches (more around 45). Sometimes, after a raise, its performance brutally decreases, falling down close to 0.

Would you have any idea as to what makes it fail so bad ?


