In the previous post, we discussed how to sample from a gaussian distribution using the box-muller method. In this post, we will explore rejection sampling, a method of sampling from a generic distribution when inversion sampling is impractical.

Consider the distribution below,

The distribution we wish to sample from

Rejection sampling involves the sampling from a proposal distribution q(x). The choice of proposal distribution should be such that there is an efficient algorithm to sample from it and it should be close to our target distribution, f(x). In particular, we want the ratio f(x)/g(x) to be upper bounded by some constant c >1. For such a c, we have the constraint 0< f(x)/c*g(x) < 1.

Given two functions g1 and g2 and their corresponding constants c1 and c2, we say g1 is closer to f than g2 if c1 < c2. In practice, we want to choose a g such that c is close to 1.

For the distribution shown above, one “good” proposal distribution is shown below:

The proposal distribution of choice is a Gaussian(shown in red)

Given the proposal(g) and the target distribution(f), the algorithm for rejection sampling is :

Intuitively, the accepts points lie in the shaded region shown below:

Notations: We will use capital letters F and G for CDFs corresponding to f and g.

To show that rejection sampling does indeed return samples from the target distribution f, it is sufficient to show that the samples points x, have the same CDF as f. Mathematicall we want to show that:

where x is a sample and X and U are random variables.

We will use the Bayes rule to compute the expression on the write.

We will now compute each of the terms on the right-hand side of the equation.

The term in the denominator of Equation 1 corresponds to the probability of a sample being accepted. Given a sample x from g, we can write the probability of x being accepted as:

We can now marginalize the conditional distribution to obtain the required probability.

Finally, we evaluate the conditional probability in the numerator:

Combining all these terms together we have,

Hence, the samples generated by rejection sampling follow the distribution as the target distribution f.

In the next post, we will dive into importance sampling.

[1] http://www.columbia.edu/~ks20/4703-Sigman/4703-07-Notes-ARM.pdf

[2] Kevin Murphy, Machine Learning: A Probabilistic Perspective