Using Bayesian Rule and Python to Save Money with Facebook Ads.

Data Ninja
Solving the Human Problem
6 min readAug 10, 2020
Photo by Damir Spanic on Unsplash

Question: How to spend the minimum possible amount of money in your Facebook advertisement and still get something meaningful from that?

In math terms, how to estimate the most likely ratio of likes/clicks a Facebook advertisement post, given a relative small number of views and impressions?

There are a few ways to do this, but in this post I will focus on the Bayesian way using Python. Although this mostly due to personal preference, the Bayesian approach give us a few extra insights and flexibility in more advanced tasks, so I think it is worth it to go over the basics for Bayesian.

Facebook ad example (photo by Erik Mclean on Unsplash)

One can optimize a Facebook ad for clicks, likes and other engagements, so for our example let’s say you want to optimize for likes in your Facebook business page. What percentage of people will like your page after having your ad served to them (i.e., seeing your ad)? 10%, 40%, 70% or could it be 90%? For the purpose of this post, let’s say the industry average of liking a post after being seeing is 50% (in reality it is much much lower than that). That means, that if your post were to be served to 1 million users you could expect close to half a million likes to your page. The practical issues here are 1) You do not know if your ad has a 50% like-to-view ratio or not; and 2) Serving ads to users costs money, so you want to know if your ad will have a good like-to-view ratio before spending the monies to get more likes. Therefore, how can we estimate sooner rather later the like-to-view ratio of your ad? Well, we see that next.

Spending $5 Dollars to Save Thousands

(Do not worry, I spent those $5 so you do not have to ;)

Let’s call θ the like-to-view ratio of our ad. The task will be to check if θ is 0.5 or not (i.e., 50% or not). Say, our ad had 10 visualizations already, yielding 2 likes, how should we update our beliefs in our ad? Should we stop running it? Should we keep running for a little longer? Well, enters Bayesian Rule.

Say we have three candidates for θ {0.4, 0.5, 0.6}, i.e., θ could be 40%, 50% or 60%, and let’s say we believe that there is a 10% chance that the real value of θ is 0.4 (in math notation p(θ=0.4) = 0.1 ), while the values of 0.5 and 0.6 have probabilities of 80% and 10% (p(θ=0.5) = 0.8 and p(θ=0.6) =0.1 in math notation). One way to interpret p(θ=0.4) =0.1 (probability of θ being 0.4 is 10%), would be that with all we know from the universe at this moment, there is a 10% change that we live in a world where 40% of the users in Facebook that see our ad will like our page. The following Python code summarizes all ours assumptions about this problem so far:

Our next step is to update our prior beliefs given our hard earned data. The way to do that is with the Bayes’ Rule:

Bayes’ Rule

Where
- p(θ|D) reads as probability of parameter θ given data D and it is called posterior distribution/probability

- p(D|θ) reads as probability of observing the data D given that the value of the parameter is θ and it is called likelihood

- p(θ) is the probability that the value of our parameter is actually θ and it is called the prior distribution/probability

- p(D) is the normalizing factor given by the unconditional probability of the data and it is often calculated as the sum of all the weighted probabilities of the data given all possible values of θ

For computation porpoises we often rewrite Bayes’ rules as below (where θ’ ranges over all possible values of θ):

Expanded Bayes’ Rule

The intuition behind this formula says that given an assumption about the world (the prior distribution) and after observing some data, p(θ|D) is the most likely probability of θ. The Bayes’ Rule tell us how to change our minds.

Before we can do some calculations in practice we need to find out how to calculate the posterior belief in this particular example. Turns out that the like-to-view ration is the standard Bernoulli distribution with probability θ for a like and probability (1-θ) of a not like. If we denote by z in {1,0} the event of like (when z equal to 1) and not like (when z equal to zero), in mathematical notation we can write the probability for the like-to-view ratio as:

Bernoulli distribution. Computes the probability of a like/not-like after the post is viewed

Now, we are ready to jump to a few plots to help drive the idea home.

In the following plot you can see the effect of data in the probability assigned to the θ parameter.

Bayesian Update in pictures. The top is the prior probability, where the prior assigns the probability of liking the page after seeing the ad to be over 80%. The Likelihood is computed after observing the ad being served to 10 users and receiving 2 likes (N=10, z=2). The bottom is the new distribution, where the Bayesian rule increases the probability that the like-to-view ratio is 40% and decreases the probability it is 60%. This is expected, since the data had a 20% like-to-view ratio, this evidence supports a 40% ratio more than it supports the 60% ratio.

The previous plot can be reproduced with the following python code:

A few things to note:

1. This example has only 3 possible values of θ, but in practice one should expect much more values, often a continuous (infinite) number of possible θs should be considered.

2. Note that if the real value of θ is 20% of like-to-view ratio, our model would never capture that, since we only had 3 values. If not careful, one might select the least bad model when using Bayesian analysis to select a model. Ways to prevent this are beyond the scope of this post.

Searching for the Like-to-View Ratio

Now, let’s assume total ignorance of our ad like-to-view ratio. To represent that, let’s assume that it is equally likely that the like-to-ratio assume any value. Also, let’s consider all the possible θs to be {0.05 0.10, 0.15, 0.20, …, 0.90, 0.95}. The following plots shows how our beliefs should change given our previous data (10 views, 2 likes):

Bayesian Update in pictures. The top is the prior probability, in this example we assume a uniform prior distribution, i.e., all the possible values of θ are equally likely. The Likelihood is computed after observing the ad being served to 10 users and receiving 2 likes (N=10, z=2) — see a peak on the θ=0.2. The bottom is the new distribution given by the posterior distribution calculated from the Bayes rule.

Important points:
1. Since our model considers 0.2 to be a possible value, now we have a peak around θ=0.2
2. Using the python code above, one can calculate the probability that θ is less than 0.5 to be 95.4%
3. This means that, given the data and our initial assumption (all values are equally likely), we can consider the like-to-view ratio of our ad to be less than 50%. Hence, we should consider changing our ad before boosting it with more paid ads, saving us some money

Conclusion

In this pots we learned how to update our probabilities given the observed data.

We learned how to judge the efficacy of an ad campaign before deciding to boost it even further.

Also, in practice the empirically observed like-to-view ratio of Facebook ads can be on the single digits, which leads to the need of hundreds to a few thousands before we can say anything with some confidence about the performance of an ad. We will see some real data in a later post.

To close this post, here is a gif of the performance of our Ad and updated beliefs of our like-to-view ratio after each view event (with or without a like event). See how the distribution converges to θ=0.3 (the true value in this case).

Bayes’ Rule in Action. Starting with an informative prior (the uniform distribution), the gif shows the effect of each new datapoint to the probability of possible θs (x-axis). N is the number of views to a Facebook ad, z is the number of likes to the page due to the ad. The true (in practice unknown) value of θ is 0.3. We can see the plot converging to 0.3 over time.

The Bayes’ Rule tell us how to change our minds.

--

--

Data Ninja
Solving the Human Problem

Focusing on Machine Learning and AI. Solving problems for the humans.