When Golden Records Don’t Shine

Warwick Matthews
10 min readApr 18, 2024

--

Golden Records are the jewels of an MDM System…but not so shiny, perhaps?

This article is also available in Japanese here.
当ブログは日本語でも
ここでご覧いただけます。

This article is fourth in a series about MDM and Golden Records. In case you missed them, the previous 3 articles are:

  1. Guess What? Your Golden Record …isn’t.
  2. What is a Golden Record and how does it work?
  3. What is “Truth” in Data?

In those previous articles we’ve opined extensively on MDM data and how, for better or worse, the Golden Record is a key part of many data management regimes. Golden records give consistency and are an enabler of the MDM “one stop shop” for data fulfillment in an organisation.

The Golden Record is the heart of most MDM systems

Golden records do, however, have a number of weaknesses and drawbacks. Some are fundamental to the Golden Record paradigm, and others more about the particular way that Golden Records are used in many MDM systems.

Before we begin our discussion let’s start with an operational definition of what constitutes a “Golden Record”:

For our purposes today an MDM system that is based around a golden record paradigm is one where all data of a particular type (e.g. customer data) is loaded, mastered and distilled into a single view that is used for all downstream consumers of that data from that MDM system.

Golden Records are invariably:

  • based on the principle that the Golden Record is “One Record to Rule Them All” i.e. a single version of the truth for all use-cases simultaneously;
  • reflective of a positivistic worldview where fill rates and some concept of objective truth is used to measure “accuracy” & “completeness”, which are in turn the principal qualitative measures for the MDM system’s overall value.
  • atemporal i.e. they do not have a mechanism to recombine based on effluxion of time, and often ignore periodicity, age or decay rates of different sources (or even different data from a single source) except in the initial ETL precedence rules where a newer source record is usually preferred over an older one.
  • built to a predefined standardized data model (often industry-based) where incoming data is manipulated to “fit” the model. This leads to gaps and/or unused data that does not successfully map to the target (e.g. discarding a 4th phone number because the relational MDM system has a fixed limit of three: work, home, mobile).

Proponents of Golden Record-based systems (often vendors of MDM platforms based on them) will argue that the “one size fits all” problem can be mitigated by adopting a more sophisticated (or at least bigger) entity data model in your MDM system (e.g. storing multiple email addresses for a customer). And the point is valid, but the bottom line is that our in-house data consumers will invariably need a consistent view of (e.g.) a customer which can answer important questions such as:

  • what is their correct name?
  • what are their accurate demographics (e.g. market segment)?
  • what are the correct vectors by which to reach them by? (e.g. which email address?)
Our stakeholders need the Data Team to create consistent views…

These downstream use-cases are (generally) well understood, and our precedence rules executed during ETL will examine the relevant input sources and select the data that produces the best possible view to meet our business needs. This is the Golden Record.

“Best” View?

What comprises the “best possible” view? Even with well-understood use cases this can be somewhat problematic. First of all — best for whom?

Best…for whom?

Our organization will likely have some combination of a C-Suite, a sales team, a marketing team, HR, an operational team and a customer care or support team. Certainly many of their data requirements will be common — but not all. So one option is to add variants in our data model for all our different departments: e.g. customized customer contact info for each department’s use. But, in the real world, customers don’t actually give us different views of themselves for each of our different internal teams… it’s usually one set of things based on the way that customer interacts with us (e.g. via a sign-up form). So — we can just go back to that single best view, right?

One (Golden Record) Ring to Rule Them All?

Actually, no, not really. Although that is what exactly most MDM systems with Golden Records will do.

The above is particularly acute in the business-to-business (B2B) space. Quite often the “best” individual in a medium or large company differs depending on the area of the business that your business is dealing with. Your company may have a direct relationship with a C-level individual at the Customer company, but actually sell to someone else in the company. There might be another individual that you direct your services to, and yet another that actually pays the bill. The Golden Record in this case cannot act as the be-all and end-all record — it is the jumping-off point, the best record that can link all those other ones! (see our next article for more)

Some data companies in the B2B space take this one step further and specialize in taking seemingly unrelated Customer data points from your system and then using their data, your data and IDR technology to delve into the corporate linkage of the Customer company — and then identify all the other “missing” golden records.

Overfitting is common issue with Golden Records, where the data is used for purposes beyond its scope. Just because the Fulfillment Database (where the Golden Records live) is the “best view” of data does not mean it should be used for absolutely everything in the organisation. It is easy for a well-meaning Data team to fall into the trap where they confuse forcing all downstream systems to come to the same place for data (a good thing) and providing one single dataset for all use-cases. This is not helped where MDM systems surface only the Golden Record dataset (via the Fulfillment Database) to internal customers and treat everything else as internal plumbing not for consumption. A common example of this issue is feeding an IDR (data matching and clustering) system with golden records instead of raw source data.

Consistency is King in MDM (in fact consistency is a core competency of all MDM systems). Different views for different use cases creates the risk of inconsistent customer views across the organisation, which is the precise problem that MDM is here to fix. So the common solution in MDM systems is to overcompensate the in the other direction — the one size fits all fulfillment (or “Vend”) database.

Not everyone wants to play.

Ironically one of the oft-observed side effects of a data model not optimized for a particular use case (e.g. the Marketing team) is the proliferation of parallel sourcing and side-files i.e. doing an end-run around the Data team’s MDM system and “going rogue”. And it happens a lot.

Some teams do an “end run” around MDM processes and Golden Records.

Many, many organisations will have an MDM system that is used consistently, efficiently and effectively — by one team. Other teams flat out refuse to use it, or quietly do their own thing in the background. Sound familiar?

Ideally, the MDM system would utilize all the Customer interactions to determine the best way to identify which “best view” in that context — e.g. Customer Service Contact and their interactions would be used to determine which Customer data constitutes the Golden Record view for Customer Service. See our next article for more on this!

Time.

Time is another key limitation of Golden Records. There is of course the perennial issue of “data freshness” (how do we get fresh data quickly into our system?) or basic ageing out of records (when is data too old to be used?), which are universal challenges for all data systems (although please check your org’s MDM system for blind spots here!).

Time is a challenge for Golden Records

Let’s zero in today at time as a challenge for Golden Records in 2 specific ways:

  1. Evolution — Our organisation’s use cases will inevitably shift over time (sometimes quite abruptly, as in recent history), and replumbing the MDM flows to adjust the end product (the Fulfillment Database and Golden Record data) is usually time-consuming and complex. Golden Records are usually a fixed view that is a result of entrenched (and often undocumented) precedence rules. An MDM system being outpaced by operational demands is a common problem — and unfortunately often leads to the entire MDM system needing to be replaced with a new one
  2. Multi Speed — Different parts of different datasets from different sources can age at different rates, which is a real challenge when you have uni-directional data flows from source ETL to MDM destination (Golden Record). In other words, an e-mail address from Source A might be considered more effective than data from Source B, but at the same time we have also observed that Source A data tends to be more volatile and age out faster than Source B. Source A is better now, but in 12 months Source B’s email address is more likely to connect. Absent an update from one of the sources very few systems are able to re-evaluate the Golden Record makeup because not only has Source A’s data already been baked into the Golden Record, Golden Record-based systems seldom tell us which source(s) contributed which data point, and when. You can’t unmake the omelette — you can’t go back.

Time can be an interesting “frenemy”. If you really peel back the layers, Time is a critical component of Accuracy — however Accuracy is seldom dependent on Time to be “true”. Yes, it helps — but — having “fresh” data is not a guarantee of accuracy. A bunch of data experts worldwide will go into a tizzy over that assertion, but let’s take a simple example: if I have lived at the same address for a decade it does not become any less “accurate” because my latest interaction using it with your company was 9 years ago.

Some time ago, a very wise colleague of ours worked on a series of data decay rate studies. Needless to say, it was a fascinating study. One of the most interesting findings was that this temporal elasticity (or “stickiness”) of data points is evolving rapdily, and can often run counter to conventional MDM wisdom. For example, as the world has evolved and the vectors of contact have changed, a data element like a phone number is today much more “sticky” than 20 years ago thanks to number portability, which is now fairly universal in the developed world. People tend to keep their personal mobile numbers for a very long time (if not forever).

One can therefore argue that it is not “Time” that is critical — but “Timing” — where in their life journey is an individual? (or company which also has a lifecycle too) Will they decide to change mobile carriers when they leave their current job, and not port their mobile phone and thus receive a new phone number? Will they have 3 mobile eSims for “home”, “work” and “personal”, with different routing rules on them, but all intersecting on the same device?

The “frenemy” flipside is that a bunch of algorithms that data systems have used for many decades (and continue to sell) to detect the type of number in question (landline, mobile…fax, Telex!) are being rendered useless. And then on the flipside of THAT, many households are progressively ditching landlines at their private residences — so “stickiness” is at present extremely uneven (home phones disappearing abruptly but mobiles staying forever, and the way we detect which is which has broken down).

Complicated — but fascinating — stuff!

If we think of Identity — that is who or what someone or something is — as being less about “distance from perfection” (i.e we need to improve our view of a person or organisation with more better data) and instead more about outcomes (i.e what data do we actually need to effectively achieve our goals ?), we might take a different path. Focusing on constructing useful, contextual MDM outcomes from the data we actually have is what we call “fit for purpose identity” (F4P ID) and will be discussed in our next and final article in this series.

John Nicodemo is one of America’s preeminent data leaders, with a career that has included management of data & content teams in the US, Canada and globally. He has led data management organisations in major businesses including Dun & Bradstreet and Loblaw Companies Limited (Canada’s largest retail group), and has been called upon to work with some of the World’s top companies on global data strategy and solutions. He is presently advising the U.S. National Football League as they completely reinvent their fan intelligence and data sharing ecosystem.

Warwick Matthews is a speaker, entrepreneur and experienced CDO with over 15 years of expertise in designing, building and managing complex global data, multilingual MDM, identity resolution and data supply chain systems, building new best-in-class solutions and integrating third-party platforms for major corporations in North America and Asia-Pacific. He is currently Chief Data Geek (aka CDO & CTO) of Compliance Data Lab (コンプライアンス・データラボ株式会社), headquartered in Tokyo, Japan.

--

--