Once again, Beta Distribution

Haneul Kim
Analytics Vidhya
Published in
4 min readJul 19, 2021
Photo by Haneul Kim

Table of Contents

1. Introduction

2. What is Beta Distribution?

3. Example

Introduction

Long time ago I’ve been asked a question “Why is Beta distribution used in Bayes theorem” at an interview for Data Analyst position. At that time I’ve never heard of Beta Distribution therefore I remember saying “I don’t know”. Recently at work, while developing Thompson Sampling and Contextual Bandits Beta distribution appeared once again. So now I’ve decided to take a deeper look into Beta distribution and really understand it once and for all.

We will focus only on statistical viewpoint thus knowledge of Reinforcement Learning and Thompson Sampling is unnecessary.

What is Beta Distribution

Beta distribution is a continuous probability distribution that represents probabilities, random variable is probability.

Commonly used to solve problems that search for probability “What is the probability of having less than 10% defects?”, “Probability of getting less than 20% heads in a coin toss?”, “Probability of an ad having CTR between 30%~40%”, etc…

Here are formulas for probability density function, expected mean, and variance (don’t worry, we will go over them with an example).

and properties of beta distribution

Example

Since I work at an Ad-tech company let’s stick to “Probability of an ad(with true CTR of 34.3%)having CTR between 30%~40%” (hope my boss sees this).

Ad’s CTR in previous month is 34.3%, this month hasn’t passed yet therefore we must update our initial probability(prior) as we gather more data.

Using CTR = 34.3% as true CTR let’s run simulation. Since we are dealing with CTR it can be represented with binomial distribution, 1 for click and 0 for no-click. Whenever click happens we increase alpha by 1 if not we increase beta by 1, this allow continuous updates of prior probability as new data comes in.

We will instantiate Ad1 object belonging to Ad Class and display ad 100times to users to see how it performs.

when our Ad1 gets displayed 50times we get alpha=20 and beta=30 which results to ctr=0.4, as we display more Ad1 its ctr gets closer to true CTR which is 0.343. As we gather more data we will get closer and closer to true CTR. For simplicity we assume stationary CTR however keep in mind that in real-world CTR is non-stationary, we must add some kind weight factor w.r.t. time to give more importance to newer data. Also in practice true CTR is unknown hence predicted using a model.

Now we plot beta distribution using alpha, beta value we’ve got.

beta distribution with alpha=37, beta=63

So to answer our question we need to calculate area between 0.3~0.4

Using plugging numbers into formula we get:

So probability of Ad1 with true ctr of 34.3% having CTR between 30%~40% is 66.6%.

As we gather more and more data our beta distribution will become narrower meaning we are more sure about our estimate. This can also cause a problem because as number of trial(display of ad) increase it requires larger data to have an effect on prior probability therefore it becomes slower at reflecting new trends. To avoid such problem you would need to include weight factor w.r.t. time or reset parameters of beta distribution at every ndisplay.

Conclusion

Beta distribution is an important and very useful distribution in Bayesian theorem because using beta ad’s prior outputs beta as a posterior which gives huge computational benefits. In next blog we will explain Bayesian theorem and why Beta Distribution is chosen as its prior distribution.

Thank you and please, please comment if there are any incorrect information!

--

--

Haneul Kim
Analytics Vidhya

Data Scientist passionate about helping the environment.