What is a Golden Record and how does it work?

by Warwick Matthews & John Nicodemo

Warwick Matthews
7 min readFeb 22, 2024

This article is also available in Japanese here.
当ブログは日本語でも
ここでご覧いただけます。

In our previous article we established that the Golden Record typically is the “best view” of an entity (e.g. a customer) in your Master Data Management (MDM) system.

One record to rule them all…

Many MDM systems will have Golden Records at their core: a single comprehensive view of every entity in the system, whose purpose is to be built and maintained to fulfill all downstream use-cases.

That view is usually constructed from one or more sources using an “If-Then-Else” decision tree, often called “precedence” or “survivorship” rules (we particularly loathe the latter term as it evokes images of the Data Hunger Games playing out in our systems!).

Source: Lionsgate via thehungergames.fandom.com

As the successor to the Data Warehousing practices of the 1990s, Golden Record-based MDM approaches can be very useful:

  • they drive all data to a single, consistent place;
  • they allow for prioritisation of provenance i.e. relative weighting of certain data sources over others so that the Golden Record is based on the “best” inputs;
  • they provide a single fountain from which all downstream systems may (or must) drink;
  • They are very convenient for commercial MDM platforms to create & manage;
  • They are a neat and simple concept to sell to the C-Suite*

* Never underestimate senior leadership’s need for a simplistic answer when they ask the question “What is our customer data strategy?”.

Golden records can help to avoid senior leaders making this face…

The Golden Record (by any name) is still the centrepiece of the vast majority of MDM systems in use today. If a system advertises a “360° View” of data it’s a good bet it is built using a golden record paradigm.

As mentioned above, the progression from source records to the unified best view — the Golden Record — is based on execution of sets of rules. These rules will often consider factors such as:

  • Age of data (recency)
  • Quality of source (possibly via a score), based on provenance such as an invoicing system vs a web survey vs commercial marketing list
  • Corroboration, count and volume of sources (e.g. 3 out of 4 sources have the same residential address for a person). Can also be referred to as “pluralization”.
  • Potential volatility of the specific data (how likely is it to change?)
  • Is the data single-valued or multi-valued? (e.g. you can have multiple given names but only one date of birth)
  • Are sources self-referential? (e.g. Source A is actually built from Source B so is really just the same data)

Remember Katy Allen from the previous article? She had recently taken over as CEO of ExCorp.

Katy Allen, new CEO of ExCorp from our previous article.

We have data on “Katy Allen” from ExCorp itself via our customer care team, and we also have information on “Katherine Allen” of WhyCo (Katy Allen’s previous gig) from a marketing file we purchased last year. So we have records which our Identity Resolution (IDR) system tells us are likely describing the same person, and which each contain a different email address. We’re a B2B business and we want our marketing to go to our contact’s main work email address. One email address is KatherineAllen@ExCorp.com and the other is KAllen77@gmail.com.

Our precedence rules should steer us away from generic domains such as Gmail and thus leave us with the ExCorp.com email for our best view.

Precedence rules will favour certain sources over others.

But what if the two addresses were info@ExCorp.com and Katy@AllenFamily.me? The former is corporate but a generic email “drop box” (and so perhaps unlikely to get directly to Kathy Allen), and the latter is likely personal but direct and on a custom domain so unlikely to be a “sign-up address”. Which to use — and for what? Precedence rules can be quite sophisticated if implemented properly — we will delve further into the implications in a future article.

Back to Golden Records: our MDM system’s precedence rules have built us a record for “Katy Allen” that we can use in all our downstream systems. This is where the real value of the Golden Record is surfaced: our entire database in standard form, available via standard interface(s) exposed by our MDM system. The system becomes a much like a vending machine for data (and in fact some MDM designers will refer to this exit point as the “vend database”).

The MDM system “vends” Golden Records

It is rare (but not unheard of) for a consuming system (e.g. Marketing team) to take an entire Golden Record for all its subjects, although to do so is probably a little wasteful and in these times of PII protections and data minimization is also often quite unwise.

Look out for a forthcoming article “Tis the Winter of our (dis)Consent: Privacy, Consent and MDM”.

In 2024 it would be nice to be able to unstrap our personal jetpacks and talk here about sophisticated event-driven JSON-based pub/sub feeds… but frankly most of this stuff ends up as CSV text files and Excel spreadsheets, which are then dutifully uploaded into the system du jour of the downstream team.

All that work…for a CSV file.

Those legacy file-based feeds may be unsexy but the Golden Record at least makes the concept simple: select a subset of the data from the totality available in the Vend Database (where the Golden Records live), put it into a file and ship it off to the receiver.

Our “Downstream Data Buffet” is selected from the Golden Record data, formatted to a reusable format.

This unsexy Golden Record has more or less delivered on the promise referred to in our first article — it has brought all the operational data together into a unifying view — and it is also now delivering value to multiple downstream use-cases — the Data team’s “customers”.

Golden Records on their way to downstream teams.

What is interesting about this “final destination” CSV/Excel file is just that — it is a final output that is useful for that downstream department / team / use-case. But there is a lot more a “Golden Record” can do, if you think expansively. Look out for more on this topic in an upcoming article (that is what is known in the biz as a “tease”).

One final aspect worth mentioning is the data stewardship & governance dimension. Standing up a Golden Data regime means that data supply and fulfillment vectors can be controlled and even enforced by the Data team, which in plain language means the CDO can ensure that all users of data get the same view at the same time: consistency, traceability and audit can be baked in. To use our previous metaphor — everyone has to eat lunch from the Data Team’s vending machine.

Golden Records can enforce consistency.

In our next piece we will take a short detour to look behind the curtain at what constitutes “Truth” in MDM data. (Spoiler: it is not as simple as it sounds!)

John Nicodemo is one of America’s preeminent data leaders, with a career that has included management of data & content teams in the US, Canada and globally. He has led data management organisations in major businesses including Dun & Bradstreet and Loblaw Companies Limited (Canada’s largest retail group), and has been called upon to work with some of the World’s top companies on global data strategy and solutions. He is presently advising the U.S. National Football League as they completely reinvent their fan intelligence and data sharing ecosystem.

Warwick Matthews has over 15 years of expertise in designing, building and managing complex global data, multilingual MDM, identity resolution and ‘data supply chain’ systems, building new best-in-class systems and integrating third-party platforms for major corporations in North America and Asia-Pacific. He is currently Chief Data Geek (aka CDO & CTO) of Compliance Data Lab in Japan.

--

--