Coding Bayesian AB Tests in Python to Boost your App or Website Conversions

The LEGO Batman Movie. Image: fandango.

Back when I was getting started into Bayesian Statistics I found it hard to find some simple ready-to-use code examples to get started with probabilistic programming. Today, there are great resources available and I want to contribute with that sharing a very simple code to get started with AB Tests in Python.


Let's say we are a content provider company that streams movies and series. We are advertising a movie at the home page of our website but we've got a bunch of other cool pictures available and we don't know if the one being used is the one that makes our users more likely to click at the movie.

The LEGO Batman Movie. Image: on the left, The Movie Database; on the right, The Reel World.

To start off we select a small percentage of our users to participate in the experiment. We do that because if it happens to be a bad experience, we don't want to impact many users. Let's select 10% of them.

Of those selected, we want to split them into multiple variations of the same size, each of which having different pictures being advertised. For simplicity, we use 2 variations here: in the first users will see the original picture and in the second users will see a picture we think that can beat the original.

It's common practice to choose a minimum confidence value to call a winner. That is subject to lots of factors but in most cases a significance level of 5% is used, which translates to a confidence value of 2.5% and 97.5%. Naively speaking, if we have 2.5% or 97.5% confidence that a variation is better than the other, we can stop the experiment and call a winner.

We'll use the PyMC3 framework to help us find the optimal picture for the movie. Let's get into code.

Building the model

Let's start by importing the libraries and defining some toy data.

We have n users in each variation, obs_v1 users clicked on the picture in variation 1 (control) and obs_v2 users clicked on the picture in variation 2 (new picture).

Next, we need to define our model. We'll use a beta distribution for the priors and a Bernoulli distribution for the likelihood. The model needs to be defined within a context so place every command inside with pm.Model(): :

alpha and beta are the parameters of the beta distribution (we choose the value 2 just to give a tiny preference to draw at first), n is the number of users in the experiment up to that moment, p is the probability of converting users and observed is the actual number of users who converted.

Now that we have setup the distributions for each variation, we can operate with them to come up with interesting results. Let's create two variables that will contain the difference and the relation between the variations.

We can operate the priors the same way we do with scalar variables because they only contain samplings of our distributions!

To finish up our model definition we need to provide the number of draws, the sampling method step and the initial state for MCMC start.

It's good practice to analyze the distribution of all parameters in the model as well as their sampling value per iteration. Make sure you run the code above and then let's plot that.

We skip the first 1000 samples because those can be noisy. Let's check the output of the command above.

On the left column you see the distributions and on the right column you see the sampling value for each step.

Finally we can plot a histogram that give us the confidence that one variation is better than the other (it's simply the area under the difference or relation curve for values above zero).

Looking at the difference or relation histogram we see we have 83% confidence that variation 2 is better than variation 1. If we have earlier defined a significance level of 5%, we can't declare a winner yet. We need to either capture more data or stop the experiment and call it a draw.

Next steps

Here we used a fixed value of users who did convert and thus our confidence output is just a number: 83%. Often times you want to follow your confidence metric up in a daily-basis so you could have a time-series graph containing on the x-axis the number of days since the experiment started and on the y-axis the confidence value. The graph looks like the one below:

I have created a repository with the full code used in this story and I will soon add the one I used to create that time-series graph. Make sure you check that out! The github repo link below:

Feel free to comment below your findings!