There is no Bayesian MCMC

Anne van Rossum
2 min readOct 5, 2016

--

Should knowledge be put in the prior or in the proposal distribution? It is customary to put information about the problem structure into the prior. The proposal distribution is like a blindfolded little rat running around through the landscape, corrected time after time by the Metropolis-Hastings acceptance step.

The proposal should obey certain restrictions so that it is guaranteed that the MCMC chain a) has something to approach (a stationary distribution exists, normally done through the stricter requirement of detailed balance) and that b) this thing to approach is unique (every state must be aperiodic, hence the Markov process must be ergodic).

Bayesian MCMC, admittedly taken into an extreme, should sample the proposal distribution from a prior itself! There should be a space from which we can sample proposal distributions! A Normal distribution as proposal distribution is very common. It is straightforward to suggest a family of Normal distributions, for example with variance sampled from a Uniform distribution (0,10] (or from an Inverse Wishart distribution). Here we have to note that by being unimaginative with the space of proposal distributions we likely won’t gain much.

It is important to realize that MCMC is artificially introducing this ad-hoc proposal distribution as some kind of auxiliary variable. The introduction of this variable is not Bayesian in itself.

In practice, the proposal distributions are often of local nature. In other words, if we see the proposal mechanism in MCMC as a search mechanism, it is often quite myopic!

One of the solutions is to navigate in a space of log-probabilities rather than the one of probabilities themselves. For example, if you have multiple unity variance Normal distributions that are further apart than 38 units, the probability to jump their will be zero due to limited floating point precision in octave:

> format long e;
> normpdf(39)
ans = 0.00000000000000e+00

And if we go to log-space they can be much further apart:

> format long e;
> lognpdf(10^17)
ans = 0.00000000000000e+00

However, if we navigate in log-space we fail to make use of this “artifact” of our FPU hardware limitations. We are not thinking about quickly jumping from high-probability island to island. The way to find another island is not to drain the ocean and walk over the bottom of it. It is by inventing a boat or a plane. It is by being curious.

It is about being driven by the unknown, not by the known!

Hence, if our Bayesian problem has additional structure to it, think about this! Are you gonna elaborately construct a sophisticated prior, or are you gonna come up with intelligent search proposals?

--

--

Anne van Rossum

Personalized homes, autonomous robots, and artificial intelligence, @annevanrossum.