Frequentist Vs. Bayesian Statistics in Data Science: A/B Testing

In this article, we will explore the key differences between two approaches that are often debated in this field, the Frequentist and Bayesian statistics, and which one to use in the context of A/B testing.

Michelle Utama
tiket.com
6 min readSep 8, 2023

--

Source: LinkedIn

We know that statistics holds a key role in data science. Statistics provides tools and methods to find structure and to give deeper data insights. One of the many instances where data scientists utilize statistical methods, including at tiket.com, is when testing whether one version of a model is performing better than the other through A/B testing. Two prominent approaches in statistical analysis are Frequentist and Bayesian statistics. These two strategies follow different logic and procedures, each offering unique advantages and disadvantages.

Historically, A/B testing has leaned towards the Frequentist approaches. Softwares such as Optimizely, Unbounce, HubSpot, and OneSignal use this method to perform A/B testing. However, Bayesian methods offer a compelling alternative with different perspective and methods of experimentation for A/B testing. Nowadays, more and more companies are moving towards the Bayesian approach as it presents more detailed metrics such as the probability of a version being the best and the potential gain from implementing it. Softwares such as AB Tasty, Google Optimize, Qubit, and VWO rely on this method to conduct A/B testing.

At tiket.com, we utilize Amplitude for conducting A/B testing. What’s intriguing is that Amplitude integrates both Frequentist and Bayesian approaches within its methodology. Bayesian methodology is employed to assess the likelihood of the new variant (A) surpassing the baseline (B), while also applying a Frequentist method, specifically the two-tailed t-test, to determine if statistical significance has been attained.

Now, the frequently asked question is, “So which one is better for A/B testing?” In the next part of this article, we will go over both approaches in more detail, exploring their fundamental principles, main testing methods, and essential factors to weigh when deciding between them.

A Quick Overview of Both Approaches

Frequentist Statistics

Frequentist statistics is rooted in the idea that probability is related to the frequency of repeated events. Frequentists treat probabilities as equivalent to “frequencies” or the number of times something happens. In other words, probability reflects the frequency of events occurred in repeated experiments over time. In Frequentist statistics, the data is treated as a random sample from an underlying population, and the goal is to estimate unknown parameters or test hypotheses about them based solely on observed data.

Frequentists use p-values to determine the strength of the evidence against a null hypothesis rather than assigning probabilities to the hypotheses themselves. They strongly emphasize the significance of the observable data and leave out prior assumptions or subjective knowledge from their analysis.

This is the model of statistics taught in most core-requirement college classes, and it’s the approach most often used by A/B testing software in the past.

Bayesian Statistics

The Bayesian statistics is based on Bayes’ Theorem. This is a mathematical formula that considers the probability of an event based on prior knowledge of conditions related to that event. This formula tells us how our current belief should be updated as we receive new data.

Unlike Frequentist, Bayesians view probabilities as a measure of belief in the likelihood of an event happening. The Bayesian approach calculates the probability that a hypothesis is true by updating prior opinions about the hypothesis as new data emerge.

The ability to incorporate prior beliefs into the hypotheses and to get a probability distribution over the parameters are one of the main reasons why some statisticians and data scientists are strongly in favor of this approach.

Main Testing Methods

In Frequentist statistics, one of the main computations is the p-value. The p-value is a statistical measurement used to validate our hypotheses. The p-value represents the probability of getting a result as extreme or more extreme than the one we got if we repeated the study again, assuming the null hypothesis is true. To compute this p-value, there are several tests we can perform such as the T-test, Chi-squared test, Analysis of Variance (ANOVA), and Regression Analysis.

Bayesian statisticians also have a variety of tests and methodologies within their framework. Some of the most common Bayesians tests are the Bayesian Hypothesis Testing, Markov Chain Monte Carlo (MCMC) Methods, Bayesian Regression, and Hierarchical Models.

Advantages and Disadvantages, Which One is Better for A/B testing?

Frequentist Statistics

  • Advantages: more simple, models are available in any programming language, easier to understand and apply, has a well-established theory and extensive literature, emphasize on observed data which do not require prior knowledge or beliefs for the analysis, computation is usually faster.
  • Disadvantages: methods can be limited when dealing with small sample sizes or complex problems requiring prior information, reliance on p-values, only tell us that a variation is winning but unable to show the actual gain interval.

Bayesian Statistics

  • Advantages: allows for integrating prior beliefs and knowledge which is useful when dealing with limited data, we can see the actual gain of a winning variation, by nature rules out false positives.
  • Disadvantages: computational complexity, subjective prior specification may introduce bias, requires a deeper understanding of probability theory and computational methods.

So, which one to chose?

In the end, there is no correct answer when choosing between Frequentist and Bayesian statistics. The choice will depend on several variables, including the nature of the model, the problem being solved, any information and past knowledge, as well as the goal of the experiment.

By using the Frequentist method, we can reliably predict future performance by using mathematical formulas and calculating statistical significance through the p-value method. Bayesian calculations could lead to an incorrect conclusion because of the risk that prior experiment knowledge may not be applicable in the new experiment. So, for A/B testing in research settings, Frequentist method might be more optimal even though Bayesian inference should still be implemented when relying on the p-value is not sufficient.

However, in a business setting, we also want to know how much of a gain is actually generated by the winning variation instead of only knowing which one is the winner. Since there are cost factors such as time, money, and resources that should be taken into considerations, we need to know whether switching from one version to the other is worth it. Moreover, the limitations in sample size and test duration of Frequentist approach often appear unattractive for companies. Therefore, in this context, the Bayesian method might be more preferred.

Conclusion

So which one is better, Bayesian or Frequentist? Both have their strengths and weakness and the final decision should be made based on available data and our own specific needs. Both approaches offer valuable tools to gain insights from data. While Frequentist methods focus on observed data and the probability of observing a dataset in repeated experiments given the null hypothesis, Bayesian approaches consider prior knowledge and deal with the probability of a hypothesis given a particular data set. Gaining insight into the distinctions between these two approaches can enhance our versatility as data scientists and empower us to make more knowledgeable choices throughout our professional journey.

--

--