Optimizing Revenue with Bayesian A/B testing

Published in

NI Tech Blog

5 min readAug 5, 2019

One of the major challenges of Natural Intelligence is to optimize our ability to help consumers make informed buying decisions.

One method to reach this goal is to optimize towards Conversion Rate, but at many cases, that’s not enough. In our specific case, we’ve found Revenue Per Visit is a much more fitting optimization goal. Yet this goal brings its own set of complexities, such as the fact optimizing Revenue Per Visit is a balancing act of two interdependent factors — Conversion Rate and Revenue Per Conversion. And in our optimization, we might hurt one metric while improving the other.

Another challenge we were facing was to test & learn at a higher rate. For many flows, the cost of reaching statistical significance meant that a test would need to be live for weeks or even months, and during this time we cannot run any other tests that might skew or interfere with our results.

In this post, I’ll show how we do revenue-based A/B testing using bayesian analytics with pymc3. You will learn how we optimize towards more complex metrics, and how we’ve reduced significantly our time-to-conclusion.

What is bayesian analytics?

Bayesian A/B Testing employs Bayesian inference methods to give you ‘probability’ of how much B is better (or worse) than A.

The immediate advantage of this method is that we can understand the result intuitively even without understanding what p-value or null hypothesis means. This means that it’s easier to communicate with our business stakeholders in a language that makes sense — the language of risk and value.

Another advantage is that since Bayesian statistics don’t care for statistical significance, you don’t have to worry too much about the test size when you evaluate the result. You can start evaluating the effect from day one by reading the probability of B being better than A. Of course, as we get more data our answers will be more accurate, but since we are using the language of probabilities, we are able to say, for example, “A is better than B with 60% probability” rather than “We don’t have enough data” So you can decide if you want to wait any longer.

Preparing Data

We will start with our dataset, each line of the dataset represents a user visit, along with its variant, conversion and revenue figures.

We will take this dataset and split it into four different arrays, two arrays will represent conversion samples for our control and treatment variants, the other two arrays will represent revenue samples for our control and treatment variants.

Setting prior distributions

At first, we will set up our prior distributions. prior distributions are assumptions (or prior knowledge we have) on how the probability of our conversion and revenue should behave. specifying a prior allows us to reach conclusions with a relatively small dataset.

For the conversion metric, we chose to use beta distribution, we’ve decided to assume a very weak alpha and beta, as we don’t have any prior knowledge on how our conversion might behave.

For our revenue, we assumed gamma distribution, which represents well the exponential nature of our revenue. Again, we’ve chosen a relatively weak prior.

Once we chose our prior distributions, we can factor in our observations

Calculating posterior

Now we can create four new distributions based on our existing distributions.

The first two distributions are a composition of our conversion and revenue distributions. Together they will provide us with the expected revenue. The other two distributions are the difference between A and B for both conversion and revenue.

Finally, we can get our posterior distributions, by letting pymc3 sample the posterior distributions for conversionA, conversionB, conversionRevenueA, conversionRevenueB, lift and revenueLift.

Analyzing the results

to get a better sense of our results, we will start by plotting our conversion and revenue distributions for both A and B variants.

By quickly looking at the plots, we can intuitively see that variant B, while keeping the same conversion rate, was able to increase the probability for higher revenue.

We will continue by drawing the posterior graphs for revenue and conversion lift.

Now we can see that there is almost ~87% that B performs better than A and that our mean lift is roughly 10¢ per visit.

Drawing Conclusions

As seen in the previous steps, we can reach conclusions with relative ease by looking at the posterior charts, but it still requires visually analyzing these charts. Can we simplify our process of drawing conclusions even more?

Since we have revenue lift probability distribution, the average value of a visit, and a rough estimation of visits for the next month, we can create a simple risk and contribution assessment for choosing B over A for the next month —

Probability of B is better than A — 87%
Expected risk of choosing B -$2,206 at 13% probability
Mean Contribution of choosing B — $7,209

We can even weigh in the “cost” of the experiment. Stating that a test that doesn’t create a lift of at least $3,000 is considered practically equivalent, and not worth the hassle —

Probability of B is better than A by at least 3,000$ — 70.9%

Using these simple statements, we can effectively communicate the risk and value of our experiments in simple sentences that help business stakeholders make informed decisions with ease.

Summary

In this post, I’ve shown how you could use Bayesian Statistics to estimate the value of your experiments using complex metrics while communicating it effectively with your business stakeholders.

We've found this method highly effective to increase the rate and impact of our experiments, increase our customer satisfaction, and as an effect — improve our bottom line.