GumGum Tech Blog
Published in

GumGum Tech Blog

Let’s Solve a Problem

“Opportunity cost” is one of those terms you may have heard dropped by a fancy suit with an expensive haircut divulging the seven steps to become the CEO of your own herbal supplements company from home in a ninety second clip that you just couldn’t seem to summon the will to scroll past because you sorta daydreamed through the first twenty-eight seconds and then felt too invested to do anything else but see it through to the end. And while it can be tricky to grasp as a business concept (especially if expressed exclusively in terms of pricing essential oil cartons), sometimes a problem presents itself in a way that it is only solvable by grappling with this intangible topic.

In the realm of digital advertisement delivery, such a problem exists in shrinking the gap between the price actually paid to publishers and the price that would have been paid for a cheaper impression that became available later; here at GumGum one of the ways we address this is through a limit on the price we are willing to pay to serve an ad throughout the day.

This maximum cost is measured in dollars per thousand impressions and so named max cost per mille (CPM). Now, such a limit can hinder an ad’s ability to meet its delivery goals since not only are some perfectly fine opportunities to serve passed over because they are too expensive, the competition for the remaining inventory increases between ads with similar targets. Therefore, it is crucial that an ad’s individual max CPM reacts to how well that ad is progressing to meet its impression goal.

What we end up with is a feedback loop where an ad’s max CPM responds by increasing when an ad struggles to meet its impression quota (freeing up inventory that would otherwise be considered too expensive), but also begins decreasing as an ad consistently meets its delivery goals in an effort to increase our margins. With that framing, let’s walk through an exercise in evaluating this process.

Now that we have a general question to answer, that is, “can we dowse an improvement to our max CPM exploration algorithm to lower our opportunity cost of serving on expensive inventory”, our first step should be to visualize the current state of affairs.

Playing with some numbers historical data using Pandas & Plotly in a Jupyter Notebook, here we see a sampling of time series of the max CPMs of ads for four of our most common ad types over time where each line represents the daily max CPM value of a distinct ad across the extent of its delivery window.

Each line is an ad, one of a sample of 25 per ad type

Visually, we can see that there are 2 distinct patterns the majority of explorations end up forming; a jagged oscillation with some drift over time or an exponential decay. The common feature between these movements is that the final, most stable max CPM limits are all found a good measure below the initial max CPM an ad is assigned. Another important observation is that ads of separate campaigns with different max CPM limits appear to experience spikes in their max CPMs at about the same time because that indicates an external factor was likely causing an issue with delivery which then caused the max CPM to increase to improve delivery the next day. In these cases especially, if the ad were to start with a lower, more stable max CPM, then it would end up increasing less since the formula for increasing max CPM as a recovery response is a percentage step up from its current value.

Both of these are compelling reasons for taking a deeper dive into our current methodology in choosing a seed value for this exploration algorithm.

Narrowing our scope

Our existing approach in picking this first value is to use just half the expected revenue (also measured per mille). Clearly this has the benefit of cautiousness, that is, it protects a minimum amount of revenue should the max CPM of an ad begin climbing unexpectedly in the automated process, but this method is not really taking into account the price of the inventory we expect ads to serve on. In fact, it follows that the best place to begin a cost limit exploration that is intrinsically tied to how well an ad meets its impression goal is at the lowest, most impression dense price within the ad’s available inventory.

Unfortunately, there’s a bit to unpack there; we must define and devise a way to measure:

  • An ad’s available inventory
  • The impression “density” of that inventory
  • How much those impressions cost to be able to compare them

Even if we manage to run the gambit and translate those concepts into concrete numbers that can be juxtaposed onto a pretty graph, we still should keep our fingers crossed that the output is something actionable; the last thing we want to see at the end of all this is a sparse spread of impressions across many price points.

Sub-problems, Sub-problems Everywhere

How many impressions do we expect from domains that an ad is eligible to serve on? This is really just a rephrasing of “What is an ad’s available inventory?” but changes the perspective to better match with the actual process at play. This takes the form of the many filters ads go through in our servers when being considered for an impression, and it’s these filters that exclude inventory, effectively creating the pool of available inventory by elimination. Trying to optimize along each of the dozens of filters quite frankly sounds like the quickest way to demystify another elusive business concept, diminishing returns, and so we will stay on target and only select what we expect to be the filter most influential to our current campaign. This is most definitely the ad unit type filter.

The unit type of an ad, that is, the designation as a video, in-image, in-screen, etc. is a single feature that most distinctly clusters all its other features (of course including price). Therefore, there are solid business reasons for considering the inventory eligible per unit type as a representation of “available inventory” for an ad of that unit type and this definition should suffice for our first round of data exploration.

Knowing the cost of an impression is trivial in theory; surely we must have signed some contract allowing us to deliver ads for some fixed price and this should be saved somewhere in some table in some database. The most straightforward approach building off that assumption would be to take the average price of suppliers across units, but the pricing structures negotiated nowadays may take many forms including floor pricing and revenue sharing, so in practice it may not be as simple as a database lookup. Further, taking a simple average takes for granted that suppliers offer similar amounts of inventory at similar prices, so at the very least we should check historical supply records for outliers that would invalidate that assumption.

With that in mind, let’s look for some trends in past inventory partitioned by unit type.

Individual publishers and domains are represented by distinct colors within their respective squares

At both the publisher and domain levels, we see consistent proportions of colors within bars (each roughly 1 week long), signaling to us that the prices of publishers and even individual domains should have unequal weights in considering the average cost of inventory for a particular type of ad unit. To get a real feel for these outliers and the necessity to weigh those contributions in our final cost calculations, we can view this same distribution in box plots with more granular ad unit distinctions.

You Get a Box Plot…

and you get a box plot…
and you get a box plot…

Now that we have an idea of just how skewed our data is, the impact of those different pricing structures must be considered. Let’s take another look at our data by domain, but this time split by what pricing types those domains operate with.

There does not appear to be a substantial amount of inventory utilizing a fixed CPM
Though not the majority, it seems a plurality incorporates a floor CPM
The majority run with a revenue sharing pricing structure, so this should be our focus moving forward

We can (arguably) rule out the impact of the fixed CPM crowd, but it’s harder to make the same case for throwing out the floor CPM given the swath of inventory running with it in place. However, these floor CPMs are in practice very small; usually a dollar or two, and typically low enough for the average ad to serve still. When compared to the effective CPMs we calculate at the end of this exercise, we may confirm if this intuition is well founded, but at this time I believe we have enough reason to discount its final impact.

Even though in the majority of inventory sharing, the same pricing structure is a benefit in that we no longer have to consider the impact of fixed or floor CPM, the revenue sharing option is the most complicated to calculate because it is derived from yet another moving target: our revenue. We should expect that the costs and revenues between ads of distinct unit types vary widely (e.g. video ads vs banner ads) but we must also consider that there are other subtleties on the demand side of delivery which affect the price paid by advertisers. If we apply similar logic, weighing the revenue per mille (RPM) of ads with the same unit type by delivery (instead of impressions), we can multiply them together to compute the effective cost per mille.

Finally, we can translate our impressions by revenue share per unit type to our common base, effective CPM, and graph the distribution of impressions by cost.

Each bin along the x axis is a unit type, some have very strong concentrations along a specific price but others are more diffuse

There are plenty of other factors to account for if we want to dig for other interesting trends and really sharpen our results more, such as geographic regions:

But I think we’ve achieved our goal; luckily there appears to be impression dense, natural starting points for our max CPM exploration. These are where we see our opportunity cost gains, that is, where we have the highest likelihood of meeting our impression goals, but with plenty of room for downward exploration to lower our cost. Starting at these points also ensures that if an ad struggles to meet its delivery goal mid contract, raising its max CPM will open up a significant amount of inventory that might enable it to make up for the difference. This would not happen if the max CPM was already above the most impression dense CPM, which is more likely to happen with our existing starting point calculation.

Summary

Wrapping it all up, we started with a fairly broad question to answer: can we find a way to increase our margin on ads by reevaluating our max CPM algorithm? From there we followed the abstractions (e.g. ‘average cost’, ‘revenue share’) to their real implementations and gained insights from exploring them graphically. With a dash of domain expertise, we were able to reduce our problem space by ruling out the impact of the other pricing models. Lastly, we produced a concise but information dense representation of our results as heat maps which enable us to easily communicate our findings to decision makers on the business side of things. Hope this little thought exercise gave some new perspectives on how to explore a problem: breaking it down into sub-problems and visualizing the data in different ways while considering what might break your assumptions along the way. Until next time…

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram