Mastering cookieless terrain: Elevating brand affinity for modern marketers

Nitin Vinayak Agrawal
MiQ Tech and Analytics
7 min readJun 20, 2024

Nitin Vinayak Agrawal, Data scientist II, MiQ, Manogna Nadella, Team lead data science, MiQ, Ben Jacob Scaria, Product analyst II, MiQ, Shruti Singh, Data scientist I, MiQ

Did you know? Brands could save up to $37 billion annually by eliminating ads that don’t resonate with their target audience. This highlights why understanding your audience is crucial for any brand. Not only does it allow for more tailored customization, but it also helps in allocating resources more efficiently, leading to more effective campaigns and improved engagement. By leveraging brand affinity, digital marketers can evaluate the brand’s influence, which enhances connections with the audience and ensures they stay updated on the latest products and offerings.

What is brand affinity and what is it for MiQ?

Brand affinity refers to the emotional connection and loyalty that consumers feel towards a particular brand. It reflects their sentiment, attitude, and association with the brand, indicating how well the brand values align with their own.

For MiQ, brand affinity plays a crucial role in how customers perceive and interact with brands, gauging their level of familiarity and influencing purchasing decisions. Therefore, as per the business requirement, we have refined the definition of brand affinity to include brand awareness, preference and loyalty, ensuring a holistic understanding of customers’ connection with brands.

MiQ’s brand affinity definition

Why cookieless brand affinity?

A resilient and future-proofed solution for Brand Affinity is of paramount importance for MiQ due to the following reasons:

  • 50% of the searches users request for insights are related to Brand Affinity.
  • We currently employ a brand affinity tool dependent on cookies. Given Google’s decision to phase out third-party cookies, it is crucial to ensure our marketing tools are fortified for uninterrupted functionality and future readiness

How do we do this?

In the current complex e-commerce environment, brands and products are present across both online and offline platforms. The integration of these channels provides a comprehensive perspective on brand affinity.

Brand affinity

Data Sources

The first step involves identifying datasets suitable for analyzing these behaviors. Below are the future-proofed datasets we’ve utilized for this purpose

Behavioural dataset classification

Online Audience

Online channels extend beyond standard traffic analysis of brand official web pages and include insights from pages where brand mentions may indicate audience interest. With this in mind, we segment the online audience into two primary categories

Identifying audiences from URL (Uniform Resource Locator)

A typical URL consists of four components. Namely: protocol, domain, path, and query of which the component of interest for us is the Path.

This is because the URL path frequently encapsulates a condensed summary of webpage content, often featuring the brand name. The goal here is to extract these mentions to identify the brand audience. An example

The path is composed of multiple words and determining which words represent the brand can be accomplished through Entity Recognition. This process categorizes entities within a text into predefined types such as Person, Organisation, Location, and more.

To do this, the URL is parsed to extract the path and pre-processed to enhance accuracy. The preprocessing steps include:

  1. String cleaning.
  2. Removing punctuation.
  3. Filtering out the custom list of stop words specific for Ad targeting.

The preprocessed URLs (~17 Mn) are then fed into a pre-trained BERT model Named Entity Recognition (NER) and the ones that are tagged as Organization are captured as brand names.

However, the pre-trained BERT model fails to capture the brand name if -

  1. A URL is convoluted — meaning it has a lengthy or confusing structure that makes extracting relevant information challenging. For example, “Calculate Weight Watchers Point”, and “Minecraft modern diesel locomotive 1 transportation”.
  2. The brand is relatively recent and hence is not known by the model due to the absence of pre-training data of BERT
  3. The brand name is like a common English word. For example, “Always”, “Only”, and “Ford” (Is it a name/brand ?)

To tackle this challenge, we fine-tuned the BERT model using a custom dataset. This dataset was manually annotated with NER tags for around 100 brands that couldn’t be identified from the URLs. After identifying the brands from the URLs, users who browsed these URLs are linked with the brands and regarded as the brand affinity audience pool.

Identifying audiences from the web page content

In addition to analyzing URL paths, online feeds also encompass data useful for identifying contextually relevant keywords associated with brands on the web pages, a concept known as page audiences. Leveraging this information, we use a heuristic matching approach to detect brand keywords, thereby improving our system’s efficacy on URLs lacking direct brand name references.

Additionally, we integrate page audience data with URL audience data to create a comprehensive online audience pool, ensuring a thorough understanding of interaction across online platforms. This pool consolidates information on brand relevance and geographic factors such as postal codes.

Offline Audiences

The offline audience is derived from a continuous feed of visits to known points of interest with a disclosed duration, allowing us to understand visitation and user journeys. By analyzing these user journeys, we evaluate how our target brands are represented across our internal datasets from store visitation and purchase behavior. This process forms the foundation of our offline audience pool, which is crucial for understanding brand engagement beyond the digital realm.

What tells us we are on the right track?

Validating Cookieless Brand Affinity insights presented a challenge due to the absence of a clear benchmark. However, we leverage our existing Brand Affinity (BA) Solution, driven by cookies, as a reference point. While not infallible, this cookie-based approach serves as our closest approximation to ground truth. Insights obtained without cookies are compared against those from cookie-based methods to establish our testing framework.

The insights obtained span demographics (e.g., age, income groups) to TV viewing habits. The importance of a segment from these insights, such as age, is quantified using a metric called an index, which is centered around 1. Segments with an index greater than 1 are deemed suitable for targeting.

Before we proceed to the testing framework, it’s important to highlight key distinctions between the cookie-based and cookie-less strategies:

  1. Data Sources: The data sources driving the cookie-based strategy differ from those powering the cookie-less strategy.
  2. Methodology: The approach to extracting brand affinity audiences differs between the two strategies. While we employ a geo-contextual methodology for the cookie-less strategy, we utilize a simple data join on cookies for the cookie-based strategy.

Due to these differences, rather than going for a direct comparison, we aim to evaluate whether the trends identified in these insights demonstrate a reasonable level of similarity. The similarity trend between indexes is computed as follows:

Trend

If a segment (e.g., age bucket 18–24) shows an index- greater than 1 or lesser than 1 in both cookie-based and cookie-less insights, it suggests a parallel trend across both data universes and the alignment of both approaches.

Trend Similarity

Trend similarity score is defined as the number of entities having similar trends divided by the total number of entities. For example, in the below figure, 5 out of 6 income segments have similar trends therefore, the trend similarity score is 83.33%.

The testing framework covered a total of 15 brands across various categories (e.g., automobile, lifestyle, retail) and audience volumes (high, medium, low) to ensure unbiasedness. To assess the reliability and consistency of the insights, we also analysed data over four consecutive weeks, resulting in a trend similarity score of 77%.

Conclusion

The brand affinity module presented in this blog empowers contemporary marketers to identify a relevant brand audience at scale using multi-channel cookie-less datasets as opposed to traditional approaches that use only online datasets. The Data Science solution (NER) enhances audience quality by capturing contextually relevant brand mentions. Test results indicate that the future-proofed module yields insights comparable to traditional brand affinity methods.

Nitin, Manogna, Shruti, and Ben are all professionals at MiQ’s Bengaluru office. Nitin, a data scientist, enjoys watching web series and exploring new places. Manogna, a data science team lead, likes reading about psychology and leadership, and going on long drives. Shruti, who transitioned from bioinformatics to programmatic advertising, loves reading and can easily distinguish between chai and tea! Ben, a product analyst, loves sharing memes on social media.

--

--