Detecting Low Converting Web Traffic Early

We usually breakdown the traffic of a campaign into its sources: the publishers (apps or websites) which display our ads. Often the quality of traffic from different sources varies. Some sources never convert while some have a few times higher conversion rate than the campaign’s average.

We like to detect high, low or never converting sources early in a campaign.

In a Cost Per Click (CPC) campaign we expect users to reach our website after clicking on a banner, then they will have a chance to convert. In reality a ratio of these users (hopefully a small ratio) get lost and never reach our website. For now let’s assume this discrepancy is negligible.

First we try to define the minimum acceptable conversion rate. A Customer is a converted Visitor. Let’s assume that we can immediately assign a value to each customer. We call this: Average Revenue Per User: ARPU.

Our margin is: revenue - cost per customer:

Since

TotalCost = Visitors * CPM
TotalRevenue = Customers * ARPU
Customers = Visitors * ConversionRate
Margin = TotalRevenue - TotalCost = Visitors * (ConversionRate * ARPU — CPC)

We can find the minimum acceptable conversion rate by setting the margin t0 0 (or any desired minimum):

0 = Visitors * (ConversionRate * ARPU  —  CPM)
MinConversionRate = CPC / ARPU

Any source with a conversion rate lower than this minimum is simply too expensive.

Usually hundreds of clicks are required to determine the conversion rate of a source confidently. Our challenge is to estimate the conversion rate of our sources as early as we can, while avoiding type II errors.

Beta Distribution

Conversion rate has a beta distribution which its parameters are the number of successes (conversions) and the number of failures:

α = 1 + number of visitors that converted
β = 1 + number of visitors that didn’t convert

For example α and β for a source that has 150 visitors and 2 conversions are:

α = 1 + 2 = 3
β = 1 + (150 - 2) = 151

This is how it looks:

The graph shows that almost all the distribution lies between 0.005 and 0.06.

The mean of distribution is:

µ = α / (α + β) = 3 / (151 + 3) = 0.019

The area under the curve for x > 0.01 is 0.93. We are almost confident that this source is performing better than our minimum.

The range of β distribution is between 0 and 1. It also only depends on the number of successes and failures, very similar to the conversion rate itself. That’s why β distribution is a good way for modeling the conversion rate.

When α + β is small (we only had a few trials) the curve is wider. The curve becomes narrower as we collect more data, whether we have more successes or failures. A wider curve indicates more uncertainty than a narrower one.

Here’s how I use this in real life: if

(Integral of Β(α, β) from 0 to MinimumConversionRate) > 0.95

then I’m very confident that this source is converting less than our acceptable minimum.

In many stats software we use cumulative functions:

Cumulative_Β(α, β, x = MinimumConversionRate) > 0.95

In R:

library(stats4)
less_than_min = function(cpc, arpu, visitors, conversions) {
min <- cpc / arpu;
alpha <- conversions + 1;
beta <- visitors - conversions + 1;
pbeta(min, alpha, beta);
}

Using JavaScript jStat library:

let less_than_min = function(cpc, arpu, visitors, conversions)
let min = cpc / arpu;
let alpha = conversions + 1;
let beta = visitors - conversions + 1;
return jStat.beta.cdf(min, alpha, beta);
}

Using Pipe I made a small online tool for calculating the chance of the conversion rate of a source being less than a minimum. http://homam.github.io/Pipe-Storyboards/#/beta

The program accepts in its input the number of total visitors (by now) and the number of converted visitors and the minimum accepted conversion rate and outputs the probability of the source having a lower than the minimum conversion rate.