Advertisement Tech — Machine Learning

Shreya Jain
6 min readMay 2, 2020

--

Artificial Intelligence in Adtech

Machine Learning plays an important role in the Adtech sector for efficiency and sophistication it can bring about in the system. The goal of machine learning is to develop methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data or other outcomes. It also gives birth to innovative products that we’ll discuss shortly.

The narrative around data is changing. If big data was once the goal, today the focus is on making data actionable. “Data science is the discipline of making data useful.”

I. Predictive analysis:

Making an analysis of users’ historical transactional data, browsing behavior, etc. to predict the future trend of users. This information places you in a better position to take action in terms of advertising. If you’re dealing with potential customers, for instance, predictive analytics can help you identify customers who most likely intend to transact by using data from existing customers (referred to as “lookalikes”) so you’re not wasting resources on poor prospects.

This can also help in deciding the types of marketing strategies needed for each customer with a known intention of retaining the customer or increasing his spending.

Some of the commonly solved predictive modeling problems are:

i) CLV: Customer lifetime value is a prediction of the net profit attributed to the entire future relationship with a customer.

ii) Ad click prediction: To predict click probability i.e. probability of a user clicking the ad which is shown to them on the partner websites for a defined timeline on the basis of historical view log data, ad impression data, and user data.

II. Price bidding algorithms:

Let’s first see what real-time biding means:

Real-time bidding (RTB) is a means by which advertising inventory is bought and sold on a per-impression basis, via programmatic instantaneous auction, similar to financial markets. With real-time bidding, advertising buyers bid on an impression and, if the bid is won, the buyer’s ad is instantly displayed on the publisher’s site. Real-time bidding lets advertisers manage and optimize ads from multiple ad-networks by granting the user access to a multitude of different networks, allowing them to create and launch advertising campaigns, prioritize networks and allocate percentages of unsold inventory, known as backfill. Source: Wikipedia

The objective is to find the optimal price for real-time bidding for display ad allocation slot.
A central issue in performance display advertising is matching campaigns to ad impressions, which can be formulated as a constrained optimization problem that maximizes revenue subject to constraints such as budget limits and inventory availability. The detailed algorithm can be found here: http://www0.cs.ucl.ac.uk/staff/w.zhang/rtb-papers/rtb-perf-bid.pdf

For a general guide to dynamic pricing algorithms: https://blog.griddynamics.com/dynamic-pricing-algorithms/. This method takes the newest approach of Machine Learning in action, called Reinforcement Learning. The paradigm of monetizing on well-performing parameters(exploitation) and simultaneously reducing the search space for better parameters(exploration) comes under RL.

III. Improve Ad creative:

Audiences respond differently to ad creative. Media, typeface, call-to-action — these are among the creative ingredients that get people clicking or tuning out. In such a system, data on past creatives and past campaigns are crunched to determine precisely what would work for ongoing efforts. With this application of AI, brands can get a better sense of how everything from messaging, fonts, colors, imagery, button sizes, or formats impact overall campaign performance.

The application of Machine Learning in Consumer Psychology by relating image features to personality types is widely in use. For instance, different features for images, including hue, saturation, color diversity, level of detail, number of people, are correlated with personality traits like openness, conscientiousness, extroversion, agreeableness, and neuroticism. The machine learning algorithms found that the relationship between personality type and image type could affect a consumer’s interest in a product. People didn’t just prefer images that matched their personalities. They reported more favorable attitudes and purchase intentions towards these brands, too.

Boosting contextual relevance:

On top of being well-designed, your ad needs to run on the right platform, with the right targeting, at the right time. Categorizing users’ sentiments in real-time while they’re browsing and displaying ads accordingly can be a use-case. For instance, if you’re browsing Instagram more than usual on a particular day or time, chances are you’ve already succumbed to the influence of social media and in this period of vulnerability, chances of conversion are high if shown ads at high frequency.

IV. Recommendation systems:

This class of algorithms would come under relevant ad-targeting. The goal is to get users who would have an affinity towards a certain product/brand based on the likings and disliking of other similar users.

Let me define a widely referred state-of-the-art technique, that goes by the name, Deep and Wide Recommendation Systems, https://arxiv.org/pdf/1606.07792.pdf. (It’s highly advised to go through the paper). The inputs to the two-branched neural network with a combined loss function, comprise of generic features, like gender, age, location, location type, etc. and specific features like the apps the user has liked. The goal is to find how other apps fare in terms of liking by this user.

Wide & Deep model structure for apps
recommendation.

The network here combines the benefits of memorization and generalization for recommender systems.

V. Probabilistic ID resolution:

The objective is to link online identifiers cross devices by assigning likelihoods to connections for records within and across data sources.

One of the most widely used approaches is to first reduce search space with common features like observational data, typically including an
identifier, timestamp, and network address and semantic data,
like, demographic estimates for the user, etc.

Now, among these clusters of IDs, we will form training pairs. It’s a two-step approach wherein pairs are first discovered using ‘proxy’ features like Spatio-temporal localization, for example, an IP address, date, etc. These pairs are now either labeled ‘+1’ if they have the same hashed email and‘-1’ if they don’t. This way we can separate similar but not the same users in the same proximity.

Pls refer to the paper for a detailed explanation: https://arxiv.org/pdf/1901.05560.pdf

VI. Fraud Detection:

As an industry matures, so does its algorithms and intricacies leading to the rise in more sophisticated frauds. Fraudulent data and ids going unnoticed can lead to major revenues losses and it needs to be filtered out at various levels, two of which are mentioned below:

Data collection end: Using Machine learning models and meta-heuristics to identify patterns of fraudulent data providers/ids. For instance, if the same id has occurred multiple times at the same timestamp but at different locations. By analyzing data at an aggregate level to find major clusters and establishing reasonings for high traffic.

Publishers end: To identify fraudulent publishers who generate illegitimate clicks and distinguish them from normal publishers. For example, click injection meaning hiking up metrics like CTR. The machine learning algorithm detects Click-to-install-time to detect non-human robotic patterns among others.

VII. Probabilistic modeling:

The downside of having large amounts of sparse data leads to a variety of problems right from difficulty in data interpretation to high costs for scanning and data analysis at run-time. The answer to this problem is simple, Sampling but the solution is not as straightforward as it requires the data to undergo a latent transformation before we can even start the sampling process.

The blogpost here talks about the complications associated with high-dimensional unstructured data and how to perform Sampling on it, https://medium.com/@zeotapstories/the-zeotech-series-estimation-of-samples-part-i-f8822a64c2f5.

Modeling distributions:

I cannot emphasize enough on the importance of Statistics as the foundation of Machine Learning along with Linear Algebra.

Probability distributions: The modeling of data into one of the standard probability distributions and exploiting its properties for your use-case. For example, with users’ historical purchase data, one can evaluate the probability of the user making the next purchase by modeling an Exponential distribution.

Another standard approach of calculating Customer Lifetime Value on its previous purchases through a negative binomial distribution(NBD) model. Estimation of two parameters, purchase rate, and lifetime value is done using Poisson distribution and Exponential distribution, respectively. The detailed algorithm can be found here: https://blogs.oracle.com/datascience/an-introduction-to-predictive-customer-lifetime-value-modeling

--

--

Shreya Jain

Product | Data Observability | Machine Learning | AdTech