Could you please elaborate on the “fake label” part? Thanks
Madhavun Candadai
1

Hi Madhavun,

The fake label is used to mask the gradient of the action we didn’t take. In this case, since we have only binary action possibilities, it will simply be the inverse of the actual action. In settings with multiple possible actions, we would mask multiple gradient paths using this fake label. Hopefully that is helpful.