An Explanation of Discretized Logistic Mixture Likelihood

3 min readJul 23, 2019

Discretized logistic mixture likelyhood is used in PixelCNN++ and WaveNet to predict discrete values. Logistic here is from logistic distribution used in the papers. Other distributions like normal distributions are also widely used in mixture distributions.

Why predicting distributions rather than actual values?

In PixcelCNN Plus, the goal is generate images, and therefore predict pixel values in the range of [0, 255]. One way of predicting these values is use softmax as used in classification tasks. However, using softmax will completely ignore the mathmatical relationship betwen numbers such as 220 and 221. Even worse, if a pixel value never presents in the training dataset, it won’t be predicted in the testing dataset either. A better way is predict the distribution of pixel values, so every value will a have probility to present when generating images utilizing the randomness of the predict distribution, just like what happens in the VAE model. To predict a distribution, you need to predict mean and variance. In the case of predicting actual values, only mean is predicted.

How to predict a distribution?

So the next qustion is how to predict the distribution. That’s simple, as maximum likelyhood could easily do it for continious values. We only need a way to adopt probaility caculation to discrete values. For pixel value prediction, we can think the range between [x-0.5, x+0.5] belongs to the discrete integer x. By applying maximum likelyhood, we can easily compute the mean and variance and therefore predict the distribution.

The edge cases

However, the probabilities of above discretization won’t wont sum up to 1, even values smaller than 0 and greater than 255 won’t show up anyway. In PixelCNN++ and Wavenet, the probabilities of values lower than 0 are given to 0, and the probabilities of value greater than 255 are given to 255.

Mixture of Distributions

Now, we have got discrete logistic likelyhood, if we choose logistic distribution as our distribution choice. Where is the mixture in the title from? When the distirbution of the data is more complex than simple distributions such as normal distribution and logistic distribution, a mixture distribution comes to help. A mixture of distriction is a linear combination of a number of distributions: p(x) = a_1 * p_1(x) + a_2 * p_2(x) + … a_n * p_n(x). Apparrently, it has more representative power by introducing more parameters. For a mixuture distribution, you have parameters a_i, mean_i and variance_i to compute. In PixelCNN++, the last layer is used to used predict these parameters, except log(variance) is used as a surrogate to avoid negative preditions for variance. For CIFAR10, it uses 10 logistic distributions to model the pixel value distribution.

If you have seen the implementation, you have noticed the discretized logistic mixture loss is just the negative value of discretized logistic mixture likelyhood.

For a walkthough of the code implementation, plz checkout this link https://github.com/Rayhane-mamah/Tacotron-2/issues/155 .

An Explanation of Discretized Logistic Mixture Likelihood

Written by Hao Gao