Guess What? Your Golden Record …isn’t.
by Warwick Matthews & John Nicodemo
This article is also available in Japanese at: https://c-datalab.com/ja/blog/idr-matching_20240126
当ブログは日本語でもご覧いただけます:https://c-datalab.com/ja/blog/idr-matching_20240126
Part 1: Entity Data & why we do MDM
If you work in Technology or have a role that deals regularly with data (and let’s face it that is almost everyone these days) you have almost certainly heard of “Master Data Management”, aka MDM: a contemporary data management discipline, usually supported by dedicated platforms & tools, which strives to improve the way an enterprise collects, aggregates, understands, organises and utilises its data.
There are numerous different flavours (and even multiple definitions) of MDM but one ubiquitous aspect of MDM is the creation of a unified view of data across an organisation.
This “creation of a unified view” very often translates info “manufacture of golden records”. But before we get to that let’s check the basics: why do we even do MDM in the first place?
Data is managed in an MDM system as one or more entity types. Data entities in an MDM system are the types of records the system manages — the most common example being a “customer” but it can be almost anything: items, products, places, systems, staff, events and so on.
In most businesses Data and the systems to work with it grow organically over time; there are very few “planned cities” in the Data world. This means that MDM is one of those things that is invariably applied to an existing system, with entrenched ways of operating and competing ownerships.
What we mean by “organically” is that quite often these systems are designed to fulfil a narrow purpose — record a sale, fulfil an order, respond to a Customer. They are not inherently designed to interconnect with each other … that is where MDM comes into play.
This data can come from many different places, and we will talk more about how we treat different data sources in the next article, but for now let’s consider a typical setup:
Example: Our company EmDee Inc sells widgets to B2B customers. We have a procurement division which manages offshore manufacturing. The widgets are imported and stored in our warehouses. Most of our sales come through our web-based e-storefront.
EmDee Inc generates a huge amount of operational data, which is the traditional sweet spot for MDM. Typical operational datasets might include:
· Customer contact info
· Storefront data (e.g. ecommerce / POS)
· Transactional data (e.g. orders & shipping)
· Manufacturing logs
· Inventory & warehouse data
· Customer service data (e.g. returns & complaints)There is also a whole world of “non-operational” data that can be very powerful too: Competitive analyses, Sales trends, Marketing & prospect data, even HR data, to name a few.
Let’s look at some typical MDM data stores:
While Master Data Management is primarily concerned with being a source of ground truth of the data it holds, another key dimension is linkage.
Note that entity linkage in MDM is not strictly the same as data relationships in a RDBMS or vertices and edges in a Graph database — the former is linkage between entities as a business function whereas relational/graph connections are a reflection of the formal structure of the data environment itself. These database structures obviously have significant influence over our MDM system and in reality there can be a lot of overlap between MDM data model and database structure, but this is a discussion for another article.
So what do we mean by linkage and why is it so important to MDM?
Another simple example:
John Smith works at ExCorp, which is one of our major customers. John has been the CEO of ExCorp for many years and is our major contact at that customer. Katy Allen has recently taken over as CEO of ExCorp.
Our MDM system has an entry in “People Master” for John Smith and an entry in Customer Master for ExCorp. John’s record links him to the ExCorp entry, and we connect the ExCorp record in Customer Master to our transaction logs so we can see what ExCorp has bought from us over time.
Where this gets very cool is if we add some “extra flavour” to these master records. Let’s suppose John Smith was previously CEO of WhyCo, also a customer. We might then see data like this:
This is also where the discipline of Identity Resolution plays a major part in determining that ExCorp’s CEO John Smith is the same person as the former CEO of WhyCo. We might also choose to use the services of a third-party data provider, to add additional insights e.g. Katy Allen’s previous work history. Once we have matched the records we can link that external data to our in-house MDM framework.
We have the data, we’ve connected it — what’s next?
MDM is not just about bringing data in, linking, enhancing and then storing it somewhere. MDM is also about providing ways & means to use that data: how can our enterprise drive actual value from this data mastering process?
When we link the data in the sample above to our sales data (transactional history) we might uncover some very interesting signals in the resulting tapestry of our enterprise data.
These signals might help to answer questions such as:
· What has ExCorp’s purchasing behaviour been since John Smith took over as CEO? How does it compare to WhyCo’s behaviour when he ran that company?
· Where does ExCorp’s new CEO Katy Allen fit into the picture — have we seen her elsewhere previously?
Our Sales team could potentially get a lot out of this data — perhaps to increase our business with ExCorp when John Smith took the helm, or conversely if WhyCo under John Smith was a consistently difficult customer to deal with we can learn from that in our management of ExCorp from now on.
This is all well and good but exactly how do we share this data with our stakeholders? What version do we show them? What data do we discard? This is a fairly straightforward question to answer if your business only has a single source of data (e.g. online sign-ups) but most businesses today need to synthesise data from multiple disparate sources.
Let’s extend the example above of the new ExCorp CEO Katy Allen: we have some data on her from our interactions with ExCorp as a customer, but we were also able to use our advanced Identity Resolution system to match her to a record we acquired last year on potential sales opportunities.
That external data has some overlap with ours (which is how we were able to match it) but it also has some differences — for example it might show a different contact number, different birthday and slightly different qualifications.
So which view of Katy Allen do we give to the Sales team? Which version of the truth makes the cut?
The question of how to choose a “best view” and what form it takes will be covered in the next article in this series: What is a Golden Record and how does it work?
John Nicodemo is one of America’s preeminent data leaders, with a career that has included management of data & content teams in the US, Canada and globally. He has led data management organisations in major businesses including Dun & Bradstreet and Loblaw Companies Limited (Canada’s largest retail group), and has been called upon to work with some of the World’s top companies on global data strategy and solutions. He is presently advising the U.S. National Football League as they completely reinvent their fan intelligence and data sharing ecosystem.
Warwick Matthews has over 15 years of expertise in designing, building and managing complex global data, multilingual MDM, identity resolution and ‘data supply chain’ systems, building new best-in-class systems and integrating third-party platforms for major corporations in North America and Asia-Pacific. He is currently Chief Data Geek (aka CDO & CTO) of Compliance Data Lab in Japan.