Better Identity Resolution with Simon Data’s IdentityQA: A Snowflake Native App

An identity model is a structured framework that captures and organizes key customer information, enabling your company to understand and interact with your customer base effectively. Ensuring accurate and efficient identity models is crucial. In this post, we will explore how Simon Data’s IdentityQA, a Snowflake Native App, can help you achieve better identity resolution while providing valuable insights and recommendations. We’ll dive into the features, use cases, and best practices of IdentityQA to empower you in managing your company’s identity model effectively.

Overview

Identity Data Validation

IdentityQA allows you to define and test the assumptions you have about your identity data, such as testing for the uniqueness and persistence of an identifier. It can also identify invalid data like incorrect email addresses or phone numbers.

Privacy-First Identity Model Check

IdentityQA operates entirely within your Snowflake account and respects all governance rules already setup, allowing you to securely validate identity model assumptions.

Easy-to-Understand Reports

IdentityQA provides simple but comprehensive reports that offer insights into the performance of your identity model and highlight areas for improvement.

Helpful Recommendations

After analyzing your data, IdentityQA provides useful advice specific to your company’s identity model. These recommendations assist you in further refining your identity model.

Use Cases

Here are the key use cases which IdentityQA solves for:

Stable Identifier Validation

Contacts are often associated with a unique identifier known as a stable identifier. Determining the stable identifier can be tricky because it must be 1:1 with a single customer profile and cannot be shared across profiles. IdentityQA validates the uniqueness of this stable identifier.

Filtering Unimportant Identifying Data

Not all data is equally important. IdentityQA filters out unnecessary identification data that doesn’t contribute to the uniqueness of a user’s identity.

Identify Over-Shared Identifier Discovery

Some identifiers like device_ids may be commonly shared across contacts if they’re generated by public computers or devices. These are not reliable and should be filtered out. IdentityQA detects these over-shared identifiers, to refine your identity model.

Identifying Data Validation

IdentityQA validates crucial identifying data, including email addresses and phone numbers. It flags any invalid entries, such as emails missing the “at symbol” or phone numbers with letters or an incorrect number of digits. This validation ensures that your contactable identifiers are accurate and complete. Note that IdentityQA does not test for bot activity.

How It Works

Operating the IdentityQA app is straightforward and efficient:

Tag the identity data

After installing the IdentityQA application, the first step is to tag the identity data, which tells the app which columns contain identifiers such as email, user_id, or phone number.

CALL SIMONIDQA.APP.SET_INPUT_TABLE('MY_TABLE',[['email', 'email', false],
['phone', 'phone_number', false],
['user_id', 'custom_id', true]]); -- is stable flag set to true

Define the assumptions

Next, you define a set of tests or constraints to apply to your identity model, like uniqueness checks or relationship checks between different identifiers.

-- One-to-One relationship check
CALL SIMONIDQA.APP.SET_RELATIONSHIP_CONSTRAINT('one-to-one', 'email', 'custom_id');

-- Uniqueness check
CALL SIMONIDQA.APP.SET_UNIQUE_CONSTRAINT('custom_id')
-- Max shared identifier limit check
CALL SIMONIDQA.APP.SET_SHARED_IDENTIFIER_LIMIT('email', 3);

Generate the report

Once the constraints are defined, IdentityQA generates a detailed report. This report outlines the results of the applied tests and provides extra metrics that allow for in-depth validation of the quality and assumptions about your identity data.

CALL SIMONIDQA.APP.generate_report();

Structure of the IdentityQA Report

Constraint Checks

The IdentityQA report shows pass/fail statuses on the constraints configured in the app.

In this example, the client_id & email identifiers passed the 1:1 relationship check going one way but not the other. This tells you that for every client_id there is only one email, but the application found instances where an email had more than one client_id.

While the uniqueness check passed on client_id, the shared identifier limit check failed on phone_number. This means that the app discovered instances where a single phone_number appears on more than 3 profiles, which violates the assumption that phone_number is shared at most across 3 profiles at any given time.

High-Cardinality Checks

The next check is a high-cardinality test for each identifier in the input table compared to the stable identifier. IdentityQA tests for cardinality because exceptionally high counts of any identifier per single stable identifier generally mean that there is an issue with the underlying data. If you think about the number of email addresses you have in real life, or the number of phone numbers you have, you’ll realize that you (and probably most people) likely have fewer than 5 emails or phone numbers.

In this example, there are 10 profiles that have an extraordinarily high count of client_ids compared to the other 15 profiles in the sample. This may indicate that these emails are fake, or maybe they’re emails used for testing purposes and should be excluded from marketing.

The next step here would be to dive into these email addresses and determine why they might have so many client_ids associated with them and if they’re valid profiles or not. If they’re invalid, it’s recommended that they be removed from customer audiences to reduce spend and increase marketing efficiencies.

Identifier Validation

Next, the report shows email & phone_number validations. This example shows that out of the ~1M total records in the input table, ~30K emails are considered invalid and ~80K phone numbers are considered invalid.

Invalid in this case means that the email or phone number is missing one or more properties in order to be considered a real, contactable identifier. Emails with no “at symbol” are flagged here, as well as phone numbers with letters in them or fewer or greater than 10 numbers.

Specific examples of invalid identifiers are also provided for easy cleanup. Why spend marketing dollars sending messages to customers you can’t reach?

Streamlit Integration

IdentityQA leverages Snowflake’s new Streamlit integration by making available all configuration and reports as a browsable web app instead of having to inspect tables manually (although you can do this if you wish).

Conclusion

In conclusion, the task of understanding and managing a company’s identity model can seem daunting. However, with Simon Data’s IdentityQA, a Snowflake Native Application, this task becomes a streamlined, secure, and insightful process. By performing rigorous data quality checks, offering intuitive reports, and providing actionable recommendations, IdentityQA elevates your identity model to a new level of efficiency and accuracy.

About Simon Data

Simon Data empowers marketing teams with the only Customer Data Platform (CDP) purpose-built to increase campaign performance through faster, more precise segmentation and personalization.

The first CDP built on Snowflake, the Simon Data Platform enables brands to break free from outdated architecture that makes data hard to access and deploy. That’s why Tripadvisor, Equinox, JetBlue, ASOS, Venmo and many others count on Simon Data to connect with consumers. Low code, the Simon Data Platform is designed for use by marketers — turning them into data scientists. Simon Data is a 2022 Built-In Best Places to Work, Great Places to Work Certified, and is an 8-time G2 Leader in the CDP space.

For more information, visit simondata.com.

--

--