Evidence Lower Bound

2 min readDec 4, 2023

Evidence Lower bound (ELBO) is a loss function that is used for variational inference. Networks that perform variational inference is also known as Generative model. The ability to generate entirely new data from a given domain. The purpose of this write-up is to show the derivation of this loss function.

To begin we will start with the Bayes rule for determining the posterior distribution of the following expression.Note, here for variational inference, we are assuming there is a latent variable z that allows the process to choose the different values for x. Therefore, if we can compute this quantity we can then sample from this distribution.

But, it is the denominator that is the culprit as we cannot compute and is intractable. We use then KL Divergence to approximate the posterior with a known distribution that we know can sample from and compute the total probability.This is where we began stating that the posterior is difficult to compute as the denominator in the right hand side is intractable. Let us first simplify a bit using the log laws. Division can be rewritten as difference as shown

Let us rearrange the above equation like so:

In the equation below the first component denotes the reconstruction error and the second component is the KL divergence between the approximate posterior and the prior. Both are tractable to compute

Above equation is also known as Evidence lower bound or ELBO. The numerator of the second component is the distribution that is approximating the true posterior — it is the KL Divergence between the approximate posterior and the prior. Furthermore, the parameters of the approximate posterior Φ can be learnt from training a neural network. ELBO is the loss function that we need to minimize

Evidence Lower Bound

Written by VJ Anand