[Paper Readthrough] — Variational Bayesian Monte Carlo
Overview
The challenge of this summary is to be able to explain a complex paper in a simple way, with
- super-minimal math required (pretty hard in this case)
- a layered structure, so the reader can be exposed to more and more details at his/her discretion
Original Paper: https://arxiv.org/abs/1810.05558?fbclid=IwAR2Irobgi5jW_RVJL4iRyv0QS_WfmgzKk-KmyoeXSZf3X26eFkz8gaTRxk4
TL;DR
- Performing Bayesian Inference has a lot of very important practical applications
- It mean estimating computing the Posterior and the Marginal Likelihood or Evidence
- Unfortunately this is intractable in general hence it is necessary to compute approximations for both
- In general there are 2 approaches: Variational Methods and Monte Carlo Methods trading off between knowledge of function (access to derivatives) or sample efficiency
- Sampling Efficiency is a key feature to be able to practically do it
- MCMC is a standard tool for Bayesian Inference but it is not super sampling efficient
- This paper introduces VBMC as a new sampling efficient Bayesian Inference tool
Some Details (Level 1)
The Problem
Let’s say we have a parametric model and given a Dataset we want to estimate Probability Density Function (PDF) of each of its params
In this case we have \Theta as the full params space
This problem is theoretically modeled in the Bayesian Framework and its solution consists of performing Bayesian Inference for which some tools exist like MCMC, however there are some problems reducing its straightforward use in practice, essentially related to the Likelihood
Likelihood
In general, let’s consider the Model Likelihood as a complex “black box” function (for example, imagine it’s a big Neural Network)
So the only way to know something about it is to sample it (like a black box, provide input and observe output)
Ideally with an unlimited evaluations budget we could reconstruct this function with arbitrary precision, however in practice the evaluation budget is limited, compared to the Likelihood complexity, hence it needs to be approximated using samples efficient approximation techniques
Realistic Likelihoods are hard:
- high dimensional
- Multi-Modal (many min and max)
- Heavy Tails (extreme events)
- Correlated Parameters
Tools
- Variational Approach
- Active Sampling
- Gaussian Process
Variational Approach — Core Idea
Substitute the complex True Posterior with simpler PDF, called Variational Posterior, fitting its parameters in a dissimilarity minimization framework, using a proper PDF similarity measure (e.g. KL Distance)
Variational Inference is performed via Optimization and solving it gives 2 results
- the Posterior Approximation
- the Evidence Approximation (ELBO)
Active Sampling
Consists of having an algorithm to make a smart choice about how to sample the unknown function (likelihood), in order to maximize some criteria while fitting the samples budget
In Math terms, the algorithm essentially consists of solving a specific optimization problem targeting the acquisition function: a function which connects the sample selection with the chosen criteria (e.g. posterior variance minimization)
More Details (Level 2)
ELBO
The Evidence Lower Bound (ELBO) has a prettSomey self explanatory name: it is the lower bound for the true Evidence (in the Bayesian jargon it is Bayesian Formula denominator)
The True Evidence is defined as the sum of ELBO and KL divergence between True and Variational Posterior so, as a result of the fact KL is always non negative by construction, we have that
- ELBO is the lower bound for Evidence
- the more the Variational Posterior is similar to the True one, the better approximation ELBO is for Evidence
Variational Approach in Physics
Variational Approach has been used in various branches of Physics like Statistical Mechanics to perform analytic computation of posteriors
To make the approximation problem analytically tractable the trick is in the choice of Variational Posterior: e.g. choosing a factorized function, the logarithm application results in a sum and this can possibly lead to closed form solutions
This of course comes at the cost of the KL divergence (its value is as high as this easy approximation is different from the true one)
Work in progress (new updates coming soon)