When Deterministic Identity Isn’t Good Enough

Chris Kane

Published in

Jounce Media Blog

4 min readJul 24, 2015

The Case for Probabilistic Data

Chris Kane

Identity is the new battleground in digital marketing, and deterministic data is the gold standard. Companies like Facebook and Google have rich records of user logins that enable cross-screen ad targeting and measurement, and the prevailing wisdom is that these deterministic data sets paint a grim picture for probabilistic identity players. But deterministic data has significant blind spots, and savvy marketers are embracing a hybrid approach to identity management.

What is identity data?

Marketers dream of a world in which they can communicate one-to-one with a consumer across multiple screens, delivering consistent messaging as he closes his laptop and picks up his phone. The lynchpin of this vision is the ability for marketers to construct a unified view of the consumer across all potential media touchpoints — desktop, phone, and tablet, and even wearables, television, and digital billboards. With a master record of all screens that reach an individual consumer, marketers can live the omni-channel dream.

The trouble with omni-channel marketing is that identity data is spotty, and there isn’t a perfect record of all the advertising IDs that are associated with an individual consumer. Imagine a consumer who owns just two devices — a laptop and a smartphone. On his laptop, the consumer uses two browsers — Chrome for most day-to-day web browsing, but also Firefox when he needs to log into his company’s email system. On his smartphone, he spends most of his time reading his Facebook and Twitter newsfeeds, and he occasionally opens Safari to browse the web. Those two devices just became at least 6 advertising IDs:

Now expand the example to consider a person who also uses a desktop at work, carries a work-issued Blackberry, owns an Android tablet, watches TV in two different rooms, and recently bought an Apple Watch. A single real-world person can quickly accumulate dozens of advertising IDs. It’s a messy data problem.

The deterministic gold standard

Facebook, Google, and a few other companies are well positioned to solve the identity data problem through deterministic device matching. If a user logs into his Facebook account on both his laptop and his phone, Facebook knows without a doubt that those two advertising IDs belong to the same real-world person. This deterministic match is the gold standard for online advertising, and the volume of Facebook and Google user logins is vastly larger than any other deterministic data set.

But even Facebook, with its 1 billion daily active users, has blind spots. Among the long list of advertising IDs from our initial example, how many are linked to a Facebook login? 3, maybe 4? Does the user log into Facebook on his work computer? On his Safari browser? How about his TV? Facebook has the authoritative set of deterministic device data, and it is nowhere near complete.

The case for probabilistic data

The solution to these deterministic blind spots is probabilistic modeling, which infers that two advertising IDs belong to the same person based on a variety of signals — shared location, similar web browsing patterns, non-overlapping time online. Individually, any of these data points uncovers limited identity information, but together, they can make a compelling case that two devices have a shared user. Probabilistic device matching can also be tested against verified deterministic data to measure and improve modeling accuracy. By augmenting deterministic data with a probabilistic model to fill blind spots, marketers can achieve a high quality identity data set with massive coverage:

Deterministic accuracy coupled with probabilistic scale is a winning recipe for marketers. It isn’t one or the other. It’s both. Advertisers, embrace hybrid identity.

Read more ad tech thinking on the Jounce Media blog

When Deterministic Identity Isn’t Good Enough

The Case for Probabilistic Data

Chris Kane

What is identity data?

The deterministic gold standard

The case for probabilistic data

Written by Chris Kane