# Importance Sampling

--

In the previous post, we looked into rejection and sampling and how it can be used to sample from a distribution when inverse sampling is not feasible. In this post, we will discuss a method called importance sampling. Importance sampling unlike rejection sampling or inverse sampling isn't used to sample from distributions. Instead its a methodology to approximate integrals of the form

where p is a probability density function.

A naive approach would be to sample from p using either inverse or rejection sampling and then using a Monte-Carlo estimate of the integral as shown below,

This method, however, requires having a lot of samples so as to have a low variance in the estimation of the integral. Importance sampling is a method to achieve lower variance with the same number of samples.

## Core Idea

The main idea of importance sampling is to sample from parts of the distribution that are “important” i.e. sample from a region that has high probability under p and a high absolute value of f. This is helpful as we want to evaluate f at points such that they are highly likely under p and the value of f at those points are high enough to be significant when calculating the sum in figure 2.

This is achieved by sampling from a proposal distribution q and rewriting the integral as:

The above-derived expectation Eq can be computed using Monte-Carlo estimation as

where

are the importance weights.

## Picking a proposal distribution

One of the motivations of using importance sampling is to reduce the variance to the estimate. Hence a natural criterion to select a proposal distribution would be to pick one that minimizes the variance of the estimate.

Let derive what such a posterior would look like.

Since the second term is independent of q, we can ignore it. Using Jensen’s inequality we can find a lower bound of the first term.

The lower bound is obtained when

This post concludes the series on sampling. In the next post we will look into variational inference.

## Reference

[1] Kevin Murphy, Machine Learning: A Probabilistic Perspective, Chapter 23