How a “Predictive CAPI” can drive growth for advertisers

7 min readAug 6, 2024

Conversion APIs (CAPIs) have become powerful tools for marketers to help paid acquisition channels optimize towards the outcomes that matter for them. Telling major ad platforms about conversions helps them use Machine Learning (ML) to optimize the media towards audiences, placements, and creatives likely to lead to those outcomes. We see a lot of this at Snowflake — they are often an early use case for customers who adopt the AI Data Cloud for Marketing. In this post, I will talk about how leveraging ML on the advertiser’s side can make the use of Conversion APIs even more powerful.

History

Conversion pixels have been used by ad platforms for years to get notified of conversion events to enable optimization. For example, Facebook introduced their conversion pixel back in 2013. This worked well for many years, but has worked less well given the rise of mobile apps and the degradation of the third-party cookie. In addition, it has become clear that digital advertising can drive conversions well beyond digital channels — people often research purchases online, but go in-store to make the purchase itself. The Conversion API, unlike the conversion pixel, can be used to power optimization in these cases.

Conversion pixels vs Conversions API in Meta

Enter the Conversion API

With a Conversion API, the advertiser asynchronously calls an Application Programming Interface (API), passing information about the consumer (like a hashed email address) and about the conversion (e.g. the time and amount of the conversion). It supports offline conversions, allows optimization when the conversion happens on a different device, and works with both web and mobile — even in Safari or Firefox without third-party cookies. This significantly expands the surface area supported by ad platforms for optimization and is robust to changes in the support of third-party cookies.

Limitations of Conversion APIs

Although Conversion APIs have significantly increased the surface area of media that can be optimized, there are still blind spots. Many of these blind spots are due to there being a time lag between when the conversion happens and when the advertiser is able to quantify the value of the conversion. This may be purely based on timing, or it may have to do with some of the conversions being higher in the funnel. Some examples are:

In the education space, someone may fill out an interest form, but may take some time to enroll in a class.
In auto, ads may be intended to lead to test drives, but purchases happen later if at all.
In finance, a consumer may apply for a loan, but approval can take time.
For subscription services, ads may lead to a sign-up, but revenue depends on how long the consumer remains a member. We would want ads to optimize towards long-term members when possible.

In general, we want to call Conversion APIs quickly after the conversion takes place. For instance, Facebook recommends calling CAPI in real-time or as close to real-time as possible. This improves the platforms’ ability to optimize the ads by improving their ability to estimate a probability of conversion. However, in the examples listed above, we don’t often have information that is both timely and accurate. For instance, we may know the average value of a test drive for an automotive advertiser, but surely some consumers are more likely to purchase than others.

Enter the “Predictive CAPI”

Although a brand may not know exactly the likelihood that a test drive of a new automobile will lead to a purchase, they likely do know the historical average. However, just using a historical average does not take full advantage of the ML capabilities of modern advertising platforms to optimize. Surely some people are more likely to convert from a test drive than others. This is where a brand can leverage their own ML to create what I call a “Predictive CAPI.”

A “Predictive CAPI” happens anytime an advertiser calls a Conversion API with the result of a Machine Learning model rather than an actual event (known as a synthetic conversion). These models could be Lifetime Value (LTV) models, probability of conversion models, propensity models, etc. These models, then, allow us to differentiate between more valuable conversions and less valuable ones. The advertiser can then call a CAPI quickly, with minimal latency, while still providing differentiation that enables better optimization from ad platforms.

Implementing a “Predictive CAPI” with Snowflake

The first step to building a Predictive CAPI is to collect the relevant data. Most of this data aligns with what would be needed to build a Customer 360 in Snowflake. Note that to be effective, we will want the data used to train the model to be available about recent conversions. If you only have a data point about long-time customers, that would not be an input you would want to use when building a predictive CAPI model, because the model will be called as soon as possible after a conversion. Often, it may be useful to build the model based on 3rd-party data, since that data may be available about both prospects and long-time customers. This includes data like age, gender, geography, household income, and other demographics. Many relevant data sets are available through the Snowflake Marketplace. Because we want to call the CAPIs as quickly as possible, consider technology like Snowpipe Streaming to bring data in.

The next step is to build the relevant ML model. Ideally, this model is built as close to the data as possible. Since we are using person-level data to build the model, moving and making copies of the data can increase the challenge of complying with privacy regulations. The model could be built by a data science team using Snowpark ML or possibly by an analyst leveraging something like ML functions.

For example, using the examples above, here are the types of models that might be built.

In the education space, we might predict the probability that a student who filled out a lead form would enroll in a class. The prediction might be based on the age and geography of the student and the type of classes they expressed interest in.
In auto, we would predict the conversion rate from test drive to purchase. The model might use independent variables like the geography of the test drive, the type and price of the car, and demographics about the driver.
In finance, we would predict the likelihood of the customer being approved for the loan.
For subscription services, we would predict the length of time until a customer churns.

How specifically to build the model is outside the scope of this article, but many resources are available about how to build these types of models. For instance, see this Quickstart about lead scoring with ML functions. This Quickstart shows how a model can be built to predict churn. Brands often start with a simple model and increase complexity over time as needed.

It’s not enough to build a model — you need to evaluate it and maintain it as well. Evaluation should take place upfront, but should also be ongoing to ensure the technique continues to work. For instance, you could perform a backtest, building the model with data up to, say, 3 months ago, and seeing how well it predicts the past 3 months. This does two things. One, it gives you confidence that the technique will work going forward. Second, it allows you to quantify the value of the technique, because you will know how much more accurate the data to ad platforms will be. For maintenance, you will likely want to re-build the model with new data on a cadence, for example nightly or weekly, ideally with an automatic pipeline.

The final step is to call the Conversion APIs themselves. There are many ways to call CAPIs. Advertisers can code against Conversion APIs in the language of their choice, leverage a third-party tool like Reverse ETL (rETL), or even leverage a Snowflake native Data Clean Room (DCR). For example, this Quickstart gives detailed instructions on calling Facebook’s CAPI using Hightouch on top of a Customer 360 in Snowflake.

Architecture diagram for Hightouch on Snowflake. — Using Hightouch on top of a Customer 360 in Snowflake

Note that ingesting and joining consent data is an important part of a build on Snowflake as well. When using a consumer’s PII as part of a marketing flow, such as calling Conversion APIs, it’s important to have the customer’s consent to do so. Ingesting consent data and using it to govern the customer data in-place simplifies the data flows and ensures compliance with regulations.

This reference architecture shows how the end-to-end flow can work on Snowflake.

Reference architecture showing how to build a “Predictive CAPI” on Snowflake. — Reference architecture for building a Precictive CAPI on Snowflake

Conclusion

Advertisers can use Machine Learning models to enhance the value of Conversion APIs, enabling them to get the most of the ML-based optimization capabilities provided by modern advertising platforms. This value is maximized when an automated data pipeline is set up, minimizing the time between conversions and the call to the Conversion API.

How a “Predictive CAPI” can drive growth for advertisers

Written by Jim Warner