Could you please elaborate on the “fake label” part? Thanks
Madhavun Candadai

Hi Madhavun,

The fake label is used to mask the gradient of the action we didn’t take. In this case, since we have only binary action possibilities, it will simply be the inverse of the actual action. In settings with multiple possible actions, we would mask multiple gradient paths using this fake label. Hopefully that is helpful.

Like what you read? Give Arthur Juliani a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.