Customer Data Platform — What features makes for a great CDP

Julien Kervizic
Oct 14 · 7 min read
Photo by QuickOrder on Unsplash

CDPs, Customer data platform is a term that is being thrown around a lot these days, without many people truly understanding what they do. There exists an abundance of solutions on the market which each tackle different aspects, but normally CDPs tend to varying degrees the same 5 key axes:

  1. Identity
  2. Data Cleansing, Transformation and Enrichment
  3. Data Centralization
  4. Audience and Segmentation
  5. Data Integration & Analytics

Identity:

CDPs build this customer profile around a concept of identity. An identity strategy shapes how data is linked or merge together. When looking at identity strategies, there is normally four main axis to look at:

  • Deterministic vs. Probabilistic
  • Hard vs. Soft Merge
  • Gold attributes
  • Treatment of historical data

Deterministic vs. Probabilistic: Most CDPs out there handle deterministic matching, meaning that they would match customers when there is some clear identifier such as a customer id, email, name or phone number being used that allow for the customers records to be linked together with an exact match. Some CDPs such as AgilOne, offer on top of a deterministic matching strategy, the ability to do propensity matching.

Propensity matching allows for the merging of records when there is a high likelihood that the records belong to the same user, for instance if you have the same first name, last name and zip code.

Hard vs. Soft merge: Identity strategies also diverge in how they are storing the different profiles once a match has happened. Hard merge strategies, will combine both profiles together into a single profile record, making it, depending on the situation either very difficult or impossible to revert the merge at a later stage. Soft merge strategies, on the other hand, keep every record entry as they were originally provided.

They work by creating associations between the different profile records, as such they are very appropriate when using probabilistic matching strategies, or any deterministic matching rule with a high degree of uncertainty that might need to be rolled back.

Gold attributes: Identity strategies also varies as to how to promote specific attributes to the golden record. Some rely on using the latest available profile, some relies on latest attribute update date, others allow the promotion by “trusted” source … It is worth considering how your CDP handles attribute promotion as not all the CDPs have the same degree of refinement in this aspect.

Treatment of Historical data: The identity also define how historical data is promoted to the golden profile. Do we want all historical information for all merged profiles to be part of this “golden record”, or do we want to only capture new information going forward. For instance, let’s look at data captured on an e-commerce website, prior to the customer signing up or purchasing a product, there is little to no PII data provided by the customer that would allow to properly identify him/her.

Once a customer has signed up or purchased a product we can usually associate that data to a known profile if it exists or create a new one within the CDP. Whether or not to leverage the data for when the customer was not yet identified is a choice to make when defining the identity strategy to use.

Data Cleansing, transformation and Enrichment

Data Cleansing and transformation: CDPs can normally leverage specific data transformation at data ingestion, Tealium for instance offers a Tally variable that would increase a counter for each event provided or rolling sums variables. AgilOne calculates specific attributes based on its’ internal data model and calculates specific propensity scores. Other vendors such as mParticle take a more developer focused approach towards data cleansing and transformation by enabling an AWS Lambda function callback, essentially executing your own piece of code when there is incoming data.

Data Enrichment: Some CDPs can enrich customer data through integration with third party data vendors such as Experian Mosaic and Consumerview, Axiom Liveramp or Oracle Datalogix, Cookie Syncing or through leveraging predictive models.

Aggregate Values Enrichment

Certain CDPs offer the ability to enrich the raw data being provided by calculating aggregate values, rollups or performing certain data transformation.

Raw data enrichment

There is a wide range of data that can be obtained from third party vendors, ranging from demographic data, lifestyle and interest data, financial data, purchase behavior. Below a sample list of data points that can be acquired through these third parties:

  • Demographic: age, gender, education, occupation, household income, marital status, or the number of kids in the household age
  • Financial data: credit score, household profitability score, property and mortgage data, credit card utilization
  • Lifestyle and interest data: such as interest in sports, video games, movies, traveling
  • Purchase behavior: Affinity towards certain product categories, affinity to purchase in specific channels such as online e-commerce
  • Address validation: such as whether or not the address on record is still an active address.

Predictive models

Predictive models can also be used to enrich the customer profile available, some of the types of scoring, that can be provided by CDP include:

  • CLV: Customer lifetime value (CLV) prediction
  • Discount sensitivity: such as propensity to purchase
  • Propensity models: such as likelihood to visit, convert, purchase, churn, open, …
  • Recommendation: Content or Product affinity, next best action..
  • Predictive segmentation: behavioral clustering/segmentation, or lookalike modeling
  • Household Clustering: Understanding who of your customers are part of the same household.

A lot of this data enrichment be done directly in the CDP or through external 3rd providers.

Data Centralization

CDPs serve a purpose of data centralization. For that purpose they usually offer an API that allows for the looking of user attributes, ingested events as well as audience membership.

This data centralization allows for the use in personalization or A/B testing. Some CDPs notably offer the feature of splitting Audiences across different tests and control group. This allows to have a unified representation of group assignments across the different touch points.

The different CDPs vary a lot in their ability to act as a central information hub. The offerings vary by retention policy, API limits or pricing model and their ability to serve both customer profile data and related events.

Audience and Segmentation

mParticle Segmentation interface

CDP offer the ability to segment the user base with any attributes ingested in the platform. These audience segment can then be exported to the different systems that have been integrated with the CDP.

CDPs traditionally work with “Adaptive Segments”, ie: segments that are constantly recalculated.

In some cases, they might offer “static segments”, segments that are calculated only once. This is usually the case when the CDP has to process “cold” data or process custom segments created by SQL queries.

Data Integration & Analytics

Connectors

CDPs facilitate the data integration of customer data between different systems. CDP typically have connector marketplaces where integration can be configured in just a few clicks.

Tealium’s eventstream display ads connector integration

Some of the integrations they offer such as an integration to Google Analytics through the measurement protocol, or to certain ads vendors is often referred as server side tagging. The typical areas of integration tackled by CDPs are:

Segment Marketing Cloud integration

The depth of the integration will vary by specific platform and CDP, some offering just an audience connector or a feed connection, other offering as well an event connector and/or a 2 way connector.

Analytics and BI

Some CDPs allow for leveraging the customer data for analytics purposes. AgilOne and Lytics for instance, gives access to their internal data schemas for querying, analysis and segmentation purposes. CDPs that offer features within the analytics space, normally address the needs for providing data for analytic purpose across two dimensions, 1) Data Access and 2) Data structures.

Data Access: There is a variety of ways that CDPs offer use to access and leverage the data for analytics purposes. Some offer the ability to use SQL like queries for segmentation purposes, an interactive query environment, dashboard integration or the ability to connect directly to the CDP’s database using an ODBC or JDBC connection, or to provide exports options directly to databases or data warehouses, essentially allowing third party software to leverage this data.

mParticle’s datawarehouse export

Data Structure: These CDPs tend to rely on relational models, and normally include data-structure more involved than just customer attributes and events. They might also include product master data, location based master data or for CDPs oriented towards the retail world store master data. Some CDPs such as Treasure data let you furthermore define your own schemas for ingestion and data processing.


Hacking Analytics

All around data & analytics topics

Julien Kervizic

Written by

Living at the interstice of business, data and technology | Solution Architect & Head of Data | Heineken, Facebook and Amazon | linkedin: https://bit.ly/2XbDffo

Hacking Analytics

All around data & analytics topics

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade