Hierarchical Bayes and Stan tutorial

A view of model posterior distributions from Stan.

This summer, I found myself working extensively with data with natural hierarchical structure. We have some variable y to predict based on features x, but the relationship differs slightly per location. We can train an independent model per location, but that is wasteful: surely we can take advantage of the fact that the relationship is mostly similar across locations, especially for those locations for which we have very little data? Alternatively, we can train a single model across locations, but this approach doesn’t account for the fact that in some locations the relationship may truly be idiosyncratic.

This setting is ideal for a Bayesian Hierarchical model, and there seems to be no better way to train such models than using Stan. In the following Jupyter Python notebook, I walk through training such models and compare them to standard/other potential approaches. In particular, I show and compare the following types of models.

  1. A single Bayesian model across locations, in Stan.
  2. A separate Bayesian model per location, in Stan.
  3. A Bayesian hierarchical model, in Stan.
  4. A single logistic regression, using scikit-learn, including with various levels of regularization.
  5. Regularized logistic regressions with both separate parameters per location, and joint parameters, using scikit-learn.

Note 1: I am neither a Stan expert nor a Bayesian statistician, so I imagine that there are code speedups and other best practices that I missed. This notebook is the result of a weekend or so of learning. Please comment here or drop me a note at nkgar6@gmail.com to correct any errors/note any improvements.

Note 2: Access the raw notebook here.