See more
The key idea is that we need a discrete distribution that is reparametrizable. There is one such distribution — the GumbelSoftmax distribution. PyTorch does not have this built-…
…that we need a discrete distribution that is reparametrizable. There is one such distribution — the GumbelSoftmax distribution. PyTorch does not have this built-in, so I simply extend it from a close cousin which has the right…
… learn the temperature parameter alpha, which automatically adjusts the entropy. Note that we learn the logarithm of alpha instead of alpha so the numeric range is more well-behaved.