Modeling the Facebook Ad Fatigue Phenomenon — Part 2

Jellysmack Labs

Published in

Jellysmacklabs

22 min readMar 21, 2023

Nizam Makdoud & Marc Caillet

Introduction

Do you remember Barbara, whom we introduced in the first article in this series of two?

Barbara is a successful video content creator who is thriving on Youtube. Willing to reach a wider audience, she created a page on Facebook and started to post her videos there, thinking her community would easily spread from Youtube to Facebook. As all too commonly observed, the spreading did not happen. Like many others before her, Barbara realized she needed to start growing a whole new community from scratch and struggled to do so.

Jellysmack helps video content creators like Barbara quickly grow their community and start generating revenue through precisely targeted, cost-optimized advertising campaigns on Facebook. For each like to a promoted content, a new follower.

Like any advertising campaign, our campaigns for promoting our creator’s content are prone to ad fatigue. This phenomenon emerges when the targeted audience gets bored with the ad because of too long exposure. When this happens, the audience poorly translates into new followers, and the money invested in the campaign is wasted.

As described in the first article, our data shows that our campaign managers are very cautious with spending and never hit ad fatigue. However, our data also indicate that they are too cautious and often prematurely stop investing in promising audiences.

Modeling the Facebook ad fatigue is critical in that it would allow us to provide our campaign managers with the ability to anticipate the emergence of this phenomenon and better optimize their investments.

As shown in the first article, the task is challenging. Classical, data-only approaches were doomed to fail. Our way out? Embracing uncertainty and embedding the domain expert knowledge of the Facebook ad fatigue phenomenon into a Bayesian model.

This second article in this series of two embarks you on a journey to modeling the Facebook ad fatigue phenomenon using Bayesian inference, fitting it, and deriving a scoring policy from the inferences provided by the model and their uncertainty.

Bayesian Modeling

Before we delve into the specifics of our models, let us make a quick detour to the basics of Bayesian inference, how it differs from frequentist inference, why it is the best choice for modeling the Facebook ad fatigue phenomenon given Jellysmack’s data, and how to implement a Bayesian workflow.

Statistical Inference

Generally speaking, inference is the process of reaching a conclusion, deriving a rule, or unveiling properties based on evidence and reasoning.

When evidence is incomplete, you have no choice but to reason under uncertainty, which is exactly what statistics are for. Thus, statistical inference is the process of inferring some properties shared by a whole population based on statistics computed from a sample drawn from that population.

Frequentist inference vs. Bayesian inference

Frequentist inference is a type of statistical inference based on frequentist probability where the probability of an event is a function of the frequency of that event. Its arch-rival, Bayesian inference, uses Bayes’ theorem to update the probability of the event each time new evidence becomes available.

The difference between those approaches to statistical inference is mainly philosophical.

The frequentists, on the one hand, are focused on the truth. Their analysis is strictly oriented to it, and they are concerned about whether they are right or wrong. They would need a significant amount of repeated events to even consider rejecting the null hypothesis. The Bayesians, on the other hand, are focused on their evolving opinion. They first embedded some prior knowledge and experience in their prior estimate. Then, event after event, as new evidence becomes available, they update their beliefs using Bayes’ theorem.

Take the question of checking whether a coin is fair, for example. A frequentist would define the null hypothesis as “the coin is fair.” They would also define a level of confidence and a tolerable margin of error. From these parameters, they would come up with the minimum number of times the coin must be tossed before the experiment is complete, and they can test whether or not they should reject the null hypothesis. A Bayesian would embed their own personal experience in a prior distribution — were the coins they tossed so far almost always fair? — and update this initial estimate after each coin toss.

Thus, frequentist inference requires a great amount of data before running an analysis to try and reject the null hypothesis. With Bayesian inference, you can start with no data and use domain expertise or even a non-informative guess to form a prior estimation, then evolve it as new data becomes available. It is an iterative process that can be summarized as follows:

Initial Belief + New Information ⇒ Improved Belief

Since we can rely on domain expert knowledge and our data is scarce, Bayesian inference is the perfect fit for our needs.

Bayesian Inference Workflow

The Bayesian inference workflow consists of three main steps:

Prior distribution
Defining a statistical model of unobserved parameters that embeds the domain expert knowledge. These prior distributions model our belief about the values of the parameters before any data is available. In case we have no prior knowledge about the unobserved parameters, the prior distribution, noted P(Θ), is a total non-informative guess.
Likelihood function
Designing the model Θ from which the likelihood function P(d|Θ) is derived. This function is the probability of observing the data given that the hypothesis described by the statistical model is true.
Posterior distribution
Computing the posterior distribution P(Θ|d) as the combination of the prior distribution P(Θ) and the likelihood function P(d|Θ) as new data is available, using Bayes’ theorem:

This operation updates our prior belief about the model’s parameters in light of new evidence. Each time new data becomes available, the posterior distribution becomes the new prior distribution which, in turn, is updated by combining it with the likelihood of the new data.

Probabilistic Programming

Probabilistic programming makes implementing Bayesian inference quite straightforward. It provides a domain-specific language that makes it easier to define the hypothesis space and apply exact or approximate inferences at scale.

Let’s illustrate how effective probabilistic programming is — using numpyro — with the implementation of the Bayesian inference workflow to estimate whether a coin is fair. A coin is deemed to be fair if, once flipped, the probability of it landing up heads equals the probability of it landing up tails equals 0.5.

Using the Bayesian inference workflow, we will solve this problem as follows:

Prior distribution

Let p be the probability of observing the coin heads up. We will compare two approaches to defining priors, using the Beta distribution but with a different set of parameter values. The Beta distribution is commonly used to model continuous random variables ranging between 0 and 1. As such, it is well suited to represent a distribution of probabilities. It takes two parameters, α and β. α - 1, respectively β - 1, can be seen as the number of successful events, respectively, the number of unsuccessful events. In our case, they can be seen as the number of heads and tails. Following are our two prior distributions:

Non-informative prior
We pretended that we had no past history with coin flipping and assumed that the coin could equally be fair or mildly or highly biased towards either heads or tails. The most undecidable situation that can be modeled with a Beta distribution is when there are no heads nor tails yet, which is expressed with α = β = 1. Hence, the non-informative prior distribution: p ~ Beta(1,1).
Informative prior
We used our personal prior knowledge about coin flipping, according to which the coin is generally fair. Hence the probability of observing heads should be somewhere around 0.5. To model a certain initial level of confidence in the coin being fair, we used a Beta distribution with an equal number of heads and tails. The higher this number, the more confident you are. We opted for α = β = 10, which models a fair level of confidence. Hence the informative prior distribution p ~ Beta(10, 10).

Figure 1 shows our two prior distributions, the non-informative one in blue and the informative one in red.

Figure 1: The non-informative prior distribution — in blue and the informative prior distribution — in red.

Likelihood function

Since the outcome is binary — we assume that the coin never lands on the edge — a Bernoulli distribution is a common choice to model the likelihood of the coin landing heads up after it has been flipped. The Bernoulli distribution will be parameterized by p, the probability of observing the outcome: P(heads up|p) ~ Bernouilli(p).

Combined with this likelihood function, our two different prior distributions give the two following models with a code snippet of their numpyro implementation:

General coin model:

def coin_model(prior_distribution, y=None):
    numpyro.sample(
        "obs",
        dist.BernoulliProbs(
            probs=numpyro.sample("p", prior_distribution)
        ),
        obs=y
    )

Coin model without prior knowledge embedded

non_informative_prior_distribution = dist.Beta(1.0, 1.0)

coin_model_without_prior_knowledge = partial(
    coin_model,
    non_informative_prior_distribution
)

Coin model with prior knowledge embedded

informative_prior_distribution = dist.Beta(10.0, 10.0)

coin_model_with_prior_knowledge = partial(
    coin_model,
    informative_prior_distribution
)

Posterior distribution

In the third step of the Bayesian workflow, we fit our model to the observations. This results in the posterior distribution conditioned on the observed data.

Say we flip the coin 100 times in a row. How would our belief — represented by the posterior distribution — in the coin being fair, i.e., in a 0.5 probability of observing heads, evolve after each outcome?

Figure 2: The evolution of the posterior distributions with a fair coin.

Figure 2 (inspired by the works of [1]) depicts the evolution of the posterior distribution of our two models, given that the coin is fair. It shows that the two posterior distributions converge towards a good estimate of the probability of observing heads after each coin flip with a fair to a high level of confidence. The distribution with the informative prior — the red one — clearly converges faster than the one with the uninformative prior — the blue one. This experiment brings to light the advantages of an accurate prior distribution: it allows for faster convergence of the posterior distribution, which means that fewer data is required to get a good estimate of the probability of observing heads.

Let us now assume that the coin has been rigged without us knowing in such a way that the probability that it lands heads up is 0.25, and let us flip it 100 times. Figure 3 depicts the evolution of the posterior distribution of our two models.

Figure 3: The evolution of the posterior distributions with a rigged coin.

As evidence becomes available, both models converge towards the actual probability of observing heads after a coin flip with increasing confidence. If we focus on the first 10 flips, we observe that the posterior distribution of the model with no prior knowledge — in blue — converges fast but with a high level of uncertainty. As for the model — in red — which assumes that the probability of observing heads equals 0.5, it converges slowly with an increasing confidence level. After 50 flips, both models’ confidence levels are similar. After 100 flips, the posterior distributions are very close to one another.

This second experiment shows that Bayesian inference allows for convergence towards a good estimate of the probability of observing heads, even if the initial belief is far from the truth. But again, the more accurate the prior distribution, the faster the convergence and the less data required.

Up to this point, you should be wondering how we applied Bayesian inference to modeling and predicting Facebook ad fatigue.

Modeling the Facebook ad fatigue phenomenon and predicting its emergence

Our prior knowledge of ad fatigue on Facebook

As discussed in the first article in this series, running a successful advertising campaign on Facebook is challenging. Standing clear from ad fatigue heavily depends on the best mix of targeted audience, promoted video, and budget.

The same article explains ad fatigue as the decrease in conversion rate as the advertising budget increases. The rationale is simple. Only a part of an audience might be interested in the creator’s video content. As more and more of them turn into new followers, the number of potentially interested people in that audience decreases. The fewer interested people in an audience, the more costly they are to reach. In short, the more money is invested in a particular audience, the lower the conversion rate. Hence the increase in the metrics we intend to minimize, the cost per new follower, which is equivalent to the CPL - the Cost Per Like - in our case.

But then, how do we translate our knowledge of Facebook ad fatigue into a Bayesian model that predicts the emergence of this phenomenon? Our knowledge is strong, so it should not be too difficult to model the data generation process. But many possibly hidden factors might influence the cost per new follower, which would result in high variability in the observations. As a result, we need to estimate the uncertainty about the model’s predictions.

What our data show us about ad fatigue on Facebook

In contrast to part 1 of this article, we switched from a daily to a cumulative viewpoint on our data about the budget and the number of new followers. This allows for:

Modeling ad fatigue as a saturation curve that represents the cumulated number of new followers as a function of the cumulative marketing budget.
A clearer depiction of the ad fatigue phenomenon.

Say there is an ongoing advertising campaign that aims at growing the community of Barbara’s Facebook page. Figure 4 shows the cumulative number of new followers acquired through this campaign as a function of the cumulative marketing budget invested in three different audiences. We named the targeted audiences according to the color of the curve. Thus, the blue curve corresponds to the blue audience.

Figure 4: The cumulative number of followers acquired through the investment in three different audiences.

We can see that the blue curve seems to flatten, which would be an indication of a decrease in the number of new followers while the marketing budget keeps increasing. In other words, the blue audience might very well suffer from ad fatigue. As for the purple and brown curves, their respective audience is perfectly fine so far. Then, the question is: can we infer the point at which ad fatigue might emerge?

Recall, this point is not observed but using Bayesian Inference, we can deduce, from observation and our statistical model, the distribution of ad fatigue.

The logistic growth model

In the previous article, in this series of two, we postulated that a linear increase in the marketing budget would lead to a linear increase in new followers. Under this hypothesis, we would observe a steady cost per new follower. Thus, the optimal strategy would be to invest the whole advertising budget in the best audience. It is, though, in total contradiction with our knowledge of the ad fatigue phenomenon, which states that the conversion rate of any given audience decreases at some point as we keep investing in it, resulting in an increase in the cost per new follower.

Our knowledge of the ad fatigue phenomenon translates into two main hypotheses regarding the acquisition of new followers through advertising campaigns on Facebook:

The number of Facebook users is finite and is not growing indefinitely. As a consequence, there is a limit to the number of new followers we can potentially acquire via advertising campaigns.
The bigger the audience, the cheaper it is to acquire new followers.

Those hypotheses easily translate into a statistical model, which we will evaluate on observations to get the likelihood function.

We will denote F, the number of new followers obtained through an advertising campaign after a budget B has been spent. Fmax is the maximum number of people from the audience who could potentially be turned into new followers. Finally, let α be the conversion rate. A strong candidate for modeling our two hypotheses is the logistic growth model [2]:

This formula expresses the fact that as the budget B invested in a given audience increases, the number of followers depends on the conversion rate 𝛂 times the number of new followers F (in green) and a factor that tends to 0 as the number of followers F tends to Fmax, the maximum possible number of new followers.

Bayesian inference handles ordinary differential equations (ODE) well, but it is usually more efficient to use the closed-form solution, i.e. an analytical expression, whenever it exists. As it happens, the ODE that models our two hypotheses admits the following closed-form solution, which we will use thereafter:

where F0 is the initial number of new followers after a small initial budget has been spent.

From this closed-form solution, we derived the following observational model:

Prior distributions

So, our prior knowledge of Facebook ad fatigue has been a key factor in the design of our model. Unfortunately, it does not inform us much about the prior distributions of the parameters α, Fmax, and F0. All we know is that all of these parameters are positive, which led us to opt for the same and mostly uninformed prior distribution for each of them: the Half-Normal distribution.

Prediction

With this first model of ad fatigue on Facebook at hand, all that remains is for us to detect the emergence of this phenomenon that occurs when the audience starts to saturate, which translates into a decrease in terms of conversion rate. The saturation can thus be viewed as the point where the return on investment starts to decrease, which is precisely the definition of the inflection point of the logistic growth model [3].

Figure 5 pictures the logistic growth curve, as well as its first and second derivatives, which, respectively, represent the growth direction and acceleration. It also exposes some points of interest: the point of inflection, the point of highest acceleration, and the point of lowest acceleration.

Figure 5: The logistic growth curve with the location of the inflection point and the highest and lowest acceleration points. The first derivative depicts the growth direction; the second derivative depicts the growth acceleration.

From those points of interest, we could derive some alerting rules to help the advertising campaign managers to decide whether they should keep on investing or not:

We could warn them as soon as the cumulated budget they invested in a given campaign reached the inflection point.
We could advise that they keep the cumulated budget between the highest and the lowest acceleration points.

Posterior distribution

Now that we have modeled the ad fatigue phenomenon and devised rules to let the campaign managers know if they are in the clear and can keep investing or if they would better stop their investment because ad fatigue is lurking around the corner, it is time to compute the posterior prediction and evaluate whether our model is an accurate representation of this phenomenon.

Following is the numpyro implementation of our logistic growth model and of the inference process:

def logistic_growth_model(budget, new_followers=None):

    def prior_distributions():
        return {
            'F0':    numpyro.sample('σ', dist.HalfNormal(10)),
            'Fmax':  numpyro.sample('M', dist.HalfNormal(10)),
            'alpha': numpyro.sample('b', dist.HalfNormal(10))
        }

    def observational_model(priors):
        return dist.Normal(
            loc=priors.Fmax / (
                1 + (
                    jnp.exp(-priors['alpha'] * budget)
                    * ((priors['Fmax'] - priors['F0']) / priors['F0']
                )
            ),
            scale=numpyro.sample("sd", dist.HalfNormal(1))
        )

    def evaluate(observational_model):
        def predict_new_followers():
            return numpyro.sample(
                "obs",
                observational_model,
                obs=new_followers
            )
                
        with numpyro.plate("data", len(budget)):
            numpyro.deterministic(
                "CPL",
                budget / (predict_new_followers() + 1e-3)
            )

    evaluate(observational_model(prior_distributions()))

Next, we have fitted our model to our data using a Hamiltonian Monte Carlo process [4]. It is a Markov Chain Monte Carlo process based on Hamiltonian dynamics used to get samples from a target distribution. Figure 6 shows what would the resulting posterior distributions predict for the blue and purple audiences, both in terms of new followers and implied CPL, the CPL we would observe should the model be accurate. The implied CPL is a byproduct of our model that turned out to be very useful for the identification of ad fatigue. For the sake of clarity, we left aside the brown audience since it showed a behavior similar to the behavior of the purple one.

The results suggest that ad fatigue would emerge just after a small extra amount of money is invested in the blue and purple audiences, with a very quick increase of the implied CPL for the blue one.

Figure 6: The posterior distributions of the blue and the purple audiences, both in terms of new followers and implied CPL, using a logistic growth model.

Those results are dubious in that they show that the logistic growth model would predict an imminent audience saturation. In both cases, we would advise Barbara’s campaign manager to stop investing immediately. This does not quite fit our knowledge of Facebook ad fatigue which tells us that the curve should be smoother.

Our model definitely needs some improvement.

Hill Function and Marketing Mix Modeling

Our logistic growth model predicts that ad fatigue will suddenly appear at some point in the daily investment process for a tiny additional amount of extra money invested in the campaign. Such predictions do not match our knowledge of the ad fatigue phenomenon.

Modeling

To get a more realistic model, we looked into more expressive models that allow for a variety of growth curves — linear, S-curve, and C-curve and other shapes of curves as depicted by Figure 7 — for the number of new followers.

Figure 7: Different shapes of curves obtained using different Hill coefficients.

The search for the best growth curve is a common goal among advertisers. All share the concern of anticipating a decrease in the return on investment. Marketing Mix Modeling [5] studies the impact of multiple media channels on sales, with each media channel associated with its specific growth curve. It commonly uses the Hill function to model ad fatigue, a function that is widely used for modeling saturation in many different fields, from biochemistry to advertisement:

where:

x is the budget.
S is the shape of the media which controls the form of the hill function. If S is greater than one, then the hill function models an S-curve. If it is lower than one, it models a hyperbola.
K is the half-saturation of the media which represents the half-maximal response concentration, i.e. the input value at which the response is half-maximal. In our case, K is the budget at which we can obtain half of the followers before saturation.

Our problem can easily translate into a Marketing Mix Modeling problem [6, 7, 8, 9, 10]:

A targeted audience can be viewed as a media channel.
The new followers can be viewed as sales.

Potential target audiences do not all contain the same number of Facebook users. As a consequence, the maximum number of new followers differs from one audience to another. To capture this fact, we used a variation of the Hill function which adds the required level of expressiveness to the original function with the addition of the parameter β, which allows for fine-grained control of the maximum number of followers:

Prior distributions

Just like our logistic growth model, our prior knowledge of Facebook ad fatigue has been a key factor in the design of our marketing mix model. But, likewise, it does not inform us much about the prior distributions of the parameters β, K, and S. All we know is that all of these parameters are positive, which led us to opt for the same and mostly uninformed prior distribution for each of them: the Half-Normal distribution.

Prediction

As discussed above, the logistic growth model allows for a straightforward investment strategy based on the inflection points. When S < 1, the Hill function admits no inflection point. So, we devised a new investment strategy based on our knowledge that a campaign manager won’t keep investing in an audience should the CPL be greater than a given threshold. This new strategy consists in estimating the maximum budget for which we are 95% certain that the implied CPL stays below the campaign manager’s CPL threshold.

In contrast with the investment strategy we derived from the logistic growth model, this new one does not directly rely on the Facebook ad fatigue curve. It is based on the CPL that is implied by our marketing mix model of the ad fatigue phenomenon.

Posterior distribution

Again, we have modeled the ad fatigue phenomenon, using a marketing mix modeling approach this time. It is now time to compute the posterior prediction and evaluate whether our model accurately represents this phenomenon. Following is the numpyro implementation of our marketing mix modeling model and of the inference process:

def marketing_mix_modeling_model(budget, new_followers=None):

    def prior_distributions():
        return {
            'K':    numpyro.sample('K', dist.HalfNormal(10)),
            'S':    numpyro.sample('S', dist.HalfNormal(10)),
            'beta': numpyro.sample('beta', dist.HalfNormal(10))
        }

    def observational_model(priors):
        return dist.Normal(
            loc=(
                jnp.power(budget, priors['S']) / (
                    jnp.power(budget, priors['S']) 
                    + jnp.power(priors['K'], priors['S'])
                )
            ) * priors['beta']
            scale=numpyro('sd', dist.Exponential(1))
        )

    def evaluate(observational_model):
        def predict():
            return numpyro.sample(
                "obs",
                observational_model,
                obs=new_followers
            )

        with numpyro.plate("data", len(budget)):
            predict()

    evaluate(observational_model(prior_distributions()))

Next, you know the drill, we fit our model with our data using a Hamiltonian Monte Carlo process. Figure 8 shows what the resulting posterior distributions predict for the blue and purple audiences in terms of new followers.

Figure 8: The posterior distributions of the blue and the purple audiences in terms of new followers, using a Hill function.

We observe that the blue audience quickly saturates, but not as suddenly as what the logistic growth model predicted. As for the purple one, no ad fatigue should emerge anytime soon: it appears to be safe to keep investing in it. Let us confirm this first assessment with the analysis of the implied CPL.

Figure 9: The implied CPL for the blue audience.

As shown by Figure 9, the implied CPL of the blue audience would quickly reach a high value, above $1, at an investment point of $6,000. This is consistent with the prediction of an imminent saturation of the blue audience. In comparison, as depicted by Figure 10, the purple audience would stay in an acceptable range of implied CPL: it is still well under $1, even for an investment point as high as $40,000. The CPL would remain under control should the budget invested in this audience keep increasing. It is consistent with our model predicting that the purple audience is far from being close to fatigue. In such a scenario, we would advise the campaign manager who is working on growing the community of Barbara’s Facebook page to immediately stop investing in the blue audience and keep investing in the purple one.

Figure 10: The implied CPL for the purple audience.

This would be a stop-or-keep-investing advice. But, as discussed above, we could be more specific and give a campaign manager a budget for which we would be 95% confident that the implied CPL would stay under a given threshold, say $0.5. Figure 11 shows the distribution of the implied CPL at different investment points.

Figure 11: The implied CPL for the purple audience at different investment points.

As expected, the higher the advertising budget, the higher the implied CPL. At a $600 spend, the CPL is centered around $0.115 with a small standard deviation, which translates into a high level of confidence. At a $1,800 spend, the CPL is significantly higher — around $0.15 — as is the standard deviation, which translates into a lesser level of confidence.

Using a Bayesian inference model, it is easy to estimate the maximum budget for which the CPL would stay under $0.5 with 95% confidence.

Figure 12: The probability that the CPL of the blue and the purple audiences are over $0.5.

Figure 12 shows the probability that the CPL would not stay under $0.5 with a 95% confidence level at any given amount of budget for both the blue and the purple audiences. Up to $30,000, the risk of the CPL being over $0.5 would be acceptable since the probability that it would be greater with 95% confidence is either zero or very low. At $32,000 and above, though, the risk would be too high, and Barbara’s campaign manager should be advised to invest no more than $32,000 in the purple audience. The same reasoning applies to the blue audience: the risk is acceptable up to $9,000, and Barbara’s campaign manager should not spend more than $10,000.

Conclusion

In the first article in this series of two, we demonstrated the limitations of traditional machine learning to model the Facebook ad fatigue phenomenon. We also emphasized the need for an inference process that would allow us to leverage domain expert knowledge about this phenomenon. We identified Bayesian inference as the perfect candidate.

In this second article, after a quick discussion about the differences between Bayesian and frequentist inference, we have shown why Bayesian inference is a good fit for our modeling needs. We then presented the Bayesian modeling workflow and showed how it could be implemented using numpyro. Finally, we devised two Bayesian models: the first one is based on logistic growth, and the second one is based on a marketing mix modeling approach. The latter has proven to be the closest to our expert knowledge. Based on this model, we have built a recommendation tool that is able to tell the campaign managers to which extent they can safely invest in any given audience.

This is a great result for Jellysmack in that it allows us to precisely optimize our advertising spending. This is also a great result for Barbara because part of the budget spent on advertising campaigns is considered an advanced payment on future revenue her Facebook page would generate once monetizable. So, the less money we spend on advertising, the more benefit for Jellysmack and Barbara.

Nonetheless, many improvements could be made to make our model even more accurate, for example:

Trendiness
The trendiness of the topics the ad creatives deal with — be it everlasting, viral, or seasonal trendiness — has some effect on the behavior of the target audiences. We could embed this knowledge into the prior distributions.
Time-varying saturation model
Unlike a static model, a time-varying model evolves over time. Such a model could allow us to model the effect of pausing ongoing advertising campaigns.
Quality of the ad creative
A poorly designed ad creative has a detrimental effect in terms of CPL. Incorporating the quality of the ad creative would allow for an even more accurate model of ad fatigue.
Hierarchical modeling
An advertising campaign usually targets multiple audiences. They are, more or less, similar to one another. We could use a hierarchical modeling approach to leverage this similarity by using the observations on a given audience to fine-tune the posterior distribution of a similar audience. By doing so, we would improve our inference process, especially when data is scarce.

References

[1] Cam Davidson-Pilon, ported to Python 3 and PyMC3 by Max Margenot and Thomas Wiecki, Probabilistic Programming and Bayesian Methods for Hackers.

[2] Tri Lai, Course on Logistic Growth, University of Nebraska-Lincoln, October 2013.

[3] Amaury W., Answer to the question “How do you find the inflection point of a logistic function?”, Socratic Q&A, August 2014.

[4] Michael Betancourt, A Conceptual Introduction to Hamiltonian Monte Carlo, January 2017.

[5] João Henrique Romeiro Alves, Bayesian Media Mix Modeling with Limited Data, August 2022.
[6] Juan Camilo Orduz, Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns, February 2022.

[7] Luca Fiaschi, Bayesian Media Mix Modeling using PyMC3, for Fun and Profit, August 2020.

[8] Benjamin Vincent, Bayesian Media Mix Modeling for Marketing Optimization, September 2021.

[9] Benjamin Vincent, Bayesian Media Mix Models: Modelling changes in marketing effectiveness over time, July 2022.

[10] Edwin Ng, Zhishi Wang, Athena Dai, Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling, September 2021.

Modeling the Facebook Ad Fatigue Phenomenon — Part 2

Introduction

Bayesian Modeling

Statistical Inference

Frequentist inference vs. Bayesian inference

Bayesian Inference Workflow

Probabilistic Programming

Modeling the Facebook ad fatigue phenomenon and predicting its emergence

Our prior knowledge of ad fatigue on Facebook

What our data show us about ad fatigue on Facebook

The logistic growth model

Prior distributions

Prediction

Posterior distribution

Hill Function and Marketing Mix Modeling

Modeling

Prior distributions

Prediction

Posterior distribution

Conclusion

References

Written by Jellysmack Labs