The Future of Marketing Attribution: Integrating Machine Learning for Enhanced Insights. Part 1

Elena K.
10 min readJun 7, 2023
Image from aerolab.co

Machine learning has become a ubiquitous tool in various domains, and marketing is no exception. It finds extensive application in predicting customer behavior, personalizing marketing campaigns, detecting ad fraud, optimizing pricing strategies, and developing marketing attribution models. However, some companies still rely on outdated methods for building marketing attribution models. In this article, we will delve into the domain of marketing attribution models and explore current practices in the industry.

Why is it important?

Today companies spend ~ 10–15% of their income only on online marketing. This is a substantial amount, and it can significantly impact future earnings with a well-aligned marketing strategy. These strategies have the potential to yield a X% increase in ROAS, conversions, or other relevant metrics. However, in the case of incorrect strategies, you will simply end up spending a lot of money without seeing any results or understanding what went wrong.

Therefore, in this article, we aim to address the following questions:

  1. What are the different types of marketing attribution models available?
  2. What are the advantages and disadvantages of each of these models?
  3. How can one choose the most suitable marketing attribution model?

There are several approaches for making an Attribution Model. They may be divided into 2 groups: Single-Touch Attribution and Multi-Touch Attribution. Let’s start from the most simple and move to the most complex.

Single-Touch Attribution

As it follows from the name of the group of these approaches, they take into account only one touchpoint.

1. First Touch Attribution

Image by the author

This is the standard and the easiest way to attribute the conversion, as it gives full credit to the first interaction.

Pros:

  • Simplicity: first touch attribution is a straightforward and easy-to-understand method.
  • Quick Implementation: first touch attribution can be easily implemented without complex calculations or data analysis, making it a convenient choice for organizations seeking a simple attribution approach.
  • Clear focus: might be good when you’re solely focus on demand generation.

Cons:

  • Oversimplification: first touch attribution oversimplifies the customer journey by neglecting the influence of subsequent touchpoints.
  • Limited View of Channel Performance: first touch attribution may disproportionately credit channels that are more likely to be the first point of contact, potentially overlooking the contributions of other channels that assist in conversions.

2. Last Touch Attribution

Image by the author

This is another easy way to attribute the conversion and this is the opposite to the first touch attribution. It gives full credit to the last interaction.

Pros:

  • Simplicity: last touch attribution is a straightforward and easy-to-understand method.
  • Quick Implementation: last touch attribution can be easily implemented without complex calculations or data analysis, making it a convenient choice for organizations seeking a simple attribution approach.
  • Clear Focus: might be good when you’re solely focus on driving conversions.

Cons:

  • Oversimplification: last touch attribution oversimplifies the customer journey by neglecting the influence of earlier touchpoints.
  • Limited Insights into Full Customer Journey: by focusing solely on the last touchpoint, this attribution approach overlooks the holistic customer journey, missing out on valuable insights into the cumulative impact of multiple touchpoints.

Multi-Touch Attribution

We discussed that approaches in single-touch attribution modeling are simple to interpret and implement. However, these methods lack fairness in assigning credit. By applying rules arbitrarily, they are unable to accurately measure the contribution of each touchpoint in the consumer journey. Consequently, marketers end up making decisions based on skewed data.

Multi-touch attribution utilizes individual user-level data from various channels. It calculates and assigns credit to the marketing touchpoints that played a role in influencing a desired business outcome for a specific key performance indicator (KPI) event.

1. Linear Attribution

Image by the author

Linear attribution is also the standard approach but it is a bit better because it takes into account all the interactions and gives them the same weights. For example, if there are 5 touches, each will get 20% of the credit.

Pros:

  • Equal Distribution: linear attribution evenly distributes credit across all touchpoints in a customer’s journey, providing a balanced representation of each touchpoint’s contribution to conversions.
  • Fairness: it avoids overemphasizing or neglecting specific touchpoints, promoting a sense of fairness in distributing credit among channels.
  • Quick Implementation: linear attribution can be easily implemented without complex calculations or data analysis, making it a convenient choice for organizations seeking a simple attribution approach.

Cons:

  • Lack of Differentiation: linear attribution assigns equal credit to each touchpoint, regardless of their actual impact on driving conversions.
  • Ignoring Time Decay: linear attribution does not account for the diminishing effect of earlier touchpoints over time. It treats all touchpoints equally, regardless of their temporal proximity to the conversion event.

2. Position-based Attribution (U-Shaped Attribution & W-Shaped Attribution

Image by the author

These 2 approaches are pretty similar and they give the biggest weight to the first and the last touches. In the w-shaped attribution also the middle touch receives the large amount of credit.

Pros:

  • Weighted Credit: position-based attribution assigns more credit to certain key touchpoints in the customer journey, such as the first touchpoint, the last touchpoint, and sometimes a few touchpoints in between.
  • Flexibility: position-based attribution allows for customization and adjustment of the credit distribution weights based on the specific business objectives and customer behavior patterns.

Cons:

  • Subjectivity: determining the specific weights for different touchpoints in position-based attribution involves subjective decision-making. The choice of weights may vary across organizations and can impact the accuracy of the attribution results.
  • Limited Adaptability: position-based attribution may not capture the nuances of every customer journey, as it tends to focus on specific positions or touchpoints.

3. Time Decay Attribution

Image by the author

This attribution model primarily assigns the majority of credit to the interactions closest to the point of conversion.

Pros:

  • Temporal Sensitivity: time decay attribution acknowledges the diminishing impact of earlier touchpoints over time. It assigns more credit to touchpoints that are closer to the conversion event, reflecting the higher influence of recent interactions.
  • Flexibility: time decay attribution allows for customization of the decay rate or time decay function, enabling organizations to fine-tune the attribution model based on their specific business needs and customer behavior patterns. May be useful for FMCG companies.

Cons:

  • Arbitrary Decay Function: determining the appropriate decay rate or function is challenging and subjective. There is no universally optimal decay function, and selecting an inappropriate decay model may lead to inaccurate credit distribution.
  • Oversimplification of Time Dynamics: time decay attribution assumes a linear or exponential decay pattern, which may not fully capture the complex temporal dynamics of customer behavior.
  • Lack of Contextual Factors: time decay attribution primarily considers the temporal aspect and may not account for other contextual factors that can influence touchpoint effectiveness, such as channel interactions, customer segments, or campaign-specific dynamics.

4. Markov Chain Attribution

Image by the author

Markov chain attribution is one of the most popular data-driven methods, and as the name suggests it takes advantage of Markov Chains. The key concept to keep in mind about markov chains is the transition matrix. It is obtained by using all customer journeys from the initial touchpoints to the desired conversion. You can read more about markov chains in this article.

Pros:

  • Sequential Analysis: markov chain attribution considers the sequential nature of customer journeys and analyzes the transitions between touchpoints, providing insights into the influence of each touchpoint on conversions.
  • Data-Driven Approach: it relies on historical data to estimate transition probabilities, making it a data-driven method that captures the empirical behavior of customers.
  • Flexible Modeling: markov chains can be customized to incorporate various factors such as time decay, channel interactions, and different attribution rules, allowing for flexibility in modeling customer journeys.

Cons:

  • Assumption of Markov Property: markov chain attribution assumes that future transitions depend solely on the current state and are independent of past history. This assumption may not always hold true in complex real-world scenarios.
  • Lack of Causality: markov chain attribution focuses on correlations and transition probabilities between touchpoints, but it does not directly address causality or the true impact of each touchpoint on conversions.
  • Limitations: can’t take into account other user data.

5. Shapley Value Attribution

Output of the SHAP library

The shapley value, created by Lloyd S. Shapley, a Nobel laureate in economics, offers a fair method for distributing rewards among team members. It is a game theory concept that ensures both gains and costs are allocated fairly among actors in a coalition. The shapley value is particularly useful when individual contributions differ, but actors collaborate to achieve a shared payoff.

In the realm of marketing, we can view various channels as players engaged in a cooperative game, where their collective efforts aim to generate conversions. This approach ensures a fair allocation of credit to each touchpoint for its contribution towards the conversion process.

Pros:

  • Fairness: the shapley value attribution provides a fair and equitable distribution of credit to different marketing channels or touchpoints based on their contributions to conversions.
  • Cooperation Incentive: it encourages collaboration among channels as they work together to achieve conversions, fostering a cooperative approach to marketing.

Cons:

  • Computationally Intensive: calculating the shapley value can be computationally expensive, especially when dealing with a large number of channels or touchpoints, which may require significant computational resources.
  • Sensitivity to Order: the shapley value attribution can be sensitive to the order in which touchpoints are considered, potentially leading to variations in results depending on the sequence of attribution.
  • Limitations: can’t take into account other user data.

6. Algorithmic Multi-Touch Attribution

All the methods we have discussed earlier exclusively relied on customer journey information. Oftentimes, this information proves to be sufficient in evaluating the contribution of each channel and formulating a strategy. Anyway, it’s not always possible to rely solely on one of the approaches described above. Fortunately, companies often possess abundant data that can be leveraged in this regard.

Let’s consider an example to illustrate how this might look in practice. Imagine, we have a client ABC and the access to all the data. It comprises customer website activity, including clicks, views on specific pages, conversions, and so on. We can use the features such as:

  • utm source;
  • utm medium;
  • utm campaign;
  • device type;
  • geographical information;
  • n of user engagements;
  • n of times scrolls;
  • etc …

These are just a few features which can be used. For example, we prepared 57 different features. Subsequently, we trained a binary classification model to predict the probability of conversion at each step. This approach not only helps us identify channels that contribute most effectively to conversions but also uncovers overvalued channels. This led us to conclude that client ABC should decrease investments in Google / CPC by 30% and increase investments in Instagram / CPC by 45%. I will provide more detailed information about the model and the results in the next article.

Pros:

  • Comprehensive Analysis: algorithmic multi-touch attribution takes into account multiple touchpoints throughout the customer journey and a lot of additional information, providing a more holistic view of the customer’s interactions with various marketing channels.
  • Scalability: this approach can handle large volumes of data, making it suitable for organizations with extensive marketing campaigns and complex customer journeys.
  • Accuracy: by leveraging advanced algorithms, this approach can provide more accurate attribution of credit to different touchpoints, helping marketers make data-driven decisions.

Cons:

  • Data Availability: this approach heavily relies on the availability and quality of data from various touchpoints. Incomplete or inaccurate data can lead to biased attribution results.
  • Complexity: implementing algorithmic multi-touch attribution requires a solid understanding of data analysis and statistical modeling techniques. It may be challenging for organizations without the necessary expertise or resources.
  • Interpretation Challenges: the complexity of the algorithmic models used in multi-touch attribution can make it difficult to interpret the results and understand the exact contribution of each touchpoint.
  • Time and Resource-Intensive: implementing and maintaining an algorithmic multi-touch attribution system can be time-consuming and resource-intensive, requiring continuous data integration, model training, and validation.

In this article we have explored all the currently available methods for modeling marketing attributions. While many of these methods can be easily implemented or understood through well-written articles, we will delve deeper into Algorithmic Multi-Touch Attribution in the following article. We will discuss how to simplify the data preparation process, identify relevant features, and interpret the model’s results.

In case I have forgotten anything, or you have anything valuable to add to this article, please consider adding your comment below.

Thank you for reading!

I hope that the insights shared today have been valuable to you. If you want to reach out to me, please feel free to add me on my LinkedIn.

--

--