Bayesian Correlation with PyMC

Philipp Singer
2 min readAug 30, 2015

--

Originally published at my old Wordpress blog.

Recently, I have been getting more and more interested in Bayesian techniques and specifically, I have been researching how to approach classical statistical inference problems within the Bayesian framework. As a start, I have looked into calculating Pearson correlation.

To that end, I have found great resources in the great blog by Rasmus Bååth who had a two-part series about how to model correlation in a Bayesian way [1,2]. A very similar model has also been proposed and discussed in [3].

My main contribution here is that I show how to apply the model with the Python library PyMC. Note that for PyMC3 some adaptions to the code would need to be done.

The main code, experiments and results can be found in the form of an jupyter notebook. Next, I just want to highlight the code for the main model and exemplary results.

The core model(s) can be expressed as follows:

Then, let us define some data and do inference.

The marginal posterior of the correlation rho (together with the mcmc trace and autocorrelation) looks like the following:

rho

What we can see here, is that the mean of rho is around 0.13 (which is similar to what a classic Pearson correlation estimation tells us). However, when we take a look at the histogram of the marginal posterior for rho, we can see that the frequency of distinct values for rho are pretty wide. We can characterize this with the 95% HDP (highest probability density) interval — -also called credible interval — -which is [-0.42 0.64]. Thus, with this HDP, we can get a very thorough view regarding the distribution of the parameters for rho. While the mean correlation is slightly positive, we cannot rule out a negative correlation or a non-existing correlation (rho=0).

For further details, examples and also an experiment with the robust model, please take a look at the IPython notebook and the references.

[1] http://www.sumsar.net/blog/2013/08/bayesian-estimation-of-correlation/
[2] http://www.sumsar.net/blog/2013/08/robust-bayesian-estimation-of-correlation/
[3] Lee, Michael D., and Eric-Jan Wagenmakers. Bayesian cognitive modeling: A practical course. Cambridge University Press, 2014.

--

--

Philipp Singer

Data Scientist at UNIQA Insurance Group, PhD in CS, passionate about machine learning, statistics, data mining, programming, blockchain, and many other fields.