Past, Present and Future of Master Data Management
You may be asking, what is master data? It is simply a real-world entity, for example a company, product, contract, person, etc.
Why do we need to master data? Because not all data sources have the same way of identifying an entity. This is especially true when you take into consideration how an entity can change over time (e.g. name, identifier, location, owner, etc).
Each data source may identify an entity or it’s characteristics differently. This poses unique challenges when combining data across multiples sources. This is especially true in industries like finance, insurance, cyber-security, and consumer packaged goods.
While there are many reasons for mastering data, the primary reason is to have the most complete view of the entity. By complete I mean it may be comprised from many raw data sources with different perspectives of the entity and its characteristics. Upon merging this data together, during the mastering process, the ideal outcome is a complete view with the best of the best data.
Some of these data sources may provide complementary data making for a “golden record” and leading up to a “single source of truth”. While other data sources may have valuable data locked up in them that are more complete, correct, or consistent. Often times there are contradictory perspectives that result in conflicts, or cases where no common attributes exist so the data is inaccessible where human intervention is required to manually map.
These are just a few of the problems that “mastering” sets out to solve.
What is master data management?
Master Data Management (MDM) is the process of merging data together from different sources to create a “single source of truth” or a “golden record”.
Much of the tooling is really old, like from the mid 2000’s. Just do a few Google searches and you’ll find lots of articles dated back to 2009 to 2014.
Most MDM tools are monolithic enterprise-class applications, with costly license fees, and requiring a lot of human intelligence (time) to get it all working. Many organizations don’t get all the way through their master data management programs because it’s too costly or they take so long they just loose steam. This is why it’s not uncommon to hear about companies that have multiple MDM solutions in place.
The tooling and techniques of today for processing data have changed significantly and so have the problems we observe in the field today.
Past Observations
Let’s first take a stroll back through time from 2004 to 2014:
- The emphasis was mastering to support day to day transactional operations like accounting.
- Mastering was done on internally produced data like Customer or Product, or regulatory required like Securities.
- Most master data problems were solved using relational database paradigm/thinking.
- Basic entity mapping techniques, like Sally in Source A and B match, and fuzzy match to Sally M in Source C.
- The pursuit of a “Golden Record” which may be great for operations but not analytics.
- Monolithic application architectures that scale up, not out.
- Hand-created and explicit “business rules” time-consumingly applied to hundreds of files or tables.
- Limited 3rd party enrichment of master entities due to the high cost of manual human data curation.
- Overnight batch-based processing systems, where you must wait for all data to arrive in order to process it.
Present Observations
Then over the past couple of years from 2014 to now in 2018, we observed:
- “Plateau of productivity” reached by most for mastering of internally produced entities or those required by regulators.
- Operational MDM’s matured, and new Analytical, Agile, or Modern MDM commercial products emerged.
- Massive data modernization projects focused on data pipeline addressing the 3V’s.
- New data pipelines created, many that home grow “entity mapping” when an MDM is clearly overkill.
- Greater focus on analytics, using historical time series data, and stitching together entities.
- Pre-mastered data sets, whereby the 3rd party data is mastered and provided with their perspective of the real-world entity.
- Shift from monolithic application architecture to microservices architecture running in the cloud.
- New standards for entity symbology management like: LEI, FIGI, OpenPermID, and CFI
Future Predictions
I predict between 2018 through 2021 we will observe the following:
- The use of knowledge graphs to help deepen our insights into relationships that may not be easily seen.
- Entity extraction and reasoning performed on unstructured text that is locked up in documents, filings, contracts, etc.
- Anomaly detection that looks below the surface to identify unusual behaviors, ambiguities, and conflicts.
- Algorithmically mapping entities using identifying characteristics to determine the highest confidence match.
- Decentralization for entity, identifier, and reference data management in tamper-less and trust-less environment.
- Learning algorithms that help pick the best anchor entity, reduce conflicts, increase connectivity and maximize extraction.
- Greater increase in open-source mapping and mastering services offered in the cloud that may help reduce costs.
Conclusion
Both supply and demand for data has exploded over the last decade. Some have characterized this growth in terms of volume, velocity, and variety.
We witnessed how this growth sparked a revolution in data processing tooling and techniques. Now we’re in the midst of a radical reshaping for how we extract value from this data.
As more raw data, text, audio, images, and videos are produced, entities will need to be extracted and knowledge must be formed.
Mastering data into entities will evolve into mastering data into knowledge. Knowledge may be reasoned through a learning process that involves both human and machine. This is one of many ways artificial intelligence will be used to help unlock the potential value that lives in our data.
It’s an exciting time ahead for those who love data.
