barakb
barakb
Sep 3, 2018 · 1 min read

I a difficult time understanding the masked part.
So let’s say our translate supposed to be:”I walked in the park”
But for now our decoder output is :
”I walked random random random random”
So I don’t want to “see” the “future”, means the random parts, so we need to put there -inf.
So why not putting -inf in the lower lines of V matrix?
I can’t understand why putting -inf in the upper triangle helps us in that case?!

Thanks.

    barakb

    Written by

    barakb