Principles for Business Control of data

Steve Jones
Collaborative Data Ecosystems
6 min readNov 18, 2021

The question of “who is responsible for data governance” is arguably a very simple one, and with a very simple answer.

Principle: The person who has the authority to change the business process to fix the data operationally, is the person who should be governing the data organizationally.

So why is such a simple principle so rarely applied? The traditional data warehouse has its many layers, but constrained within a single governance framework, and often further constrained within a “single canonical form” the IT vision of truth which is never right. With data, and the digital world associated with it, becoming the primary driver of business value a new approach is required. One which puts business ownership and business governance at the heart and aims to reflect the operational control of the business within data governance.

Back in the day I wrote a book on how to build Business Service Architectures and its an approach I continue to use successfully to this day. One of the key principles in BSA is the concept of hierarchy, a procurement department only has power because its within finance, the hierarchy is crucial when modeling the business. One of the parts of that was to talk about business processes as really just being KPIs, i.e. they are how we measure the organization, not how we run it. These domains of control are where governance can occur and arguably can only occur.

This and another factor across multiple programs has helped identify an ancillary factor that impacts how people build data systems today:

There is no such thing as “the business

Controversial? Hopefully not, but it really is how we architect a lot of data platforms today. Based on a flawed assumption:

Flawed Assumption: Every data requirement is incremental

What I mean by this is we add all the requirements for a data object together and go “These are the requirements for this data object to work”, this misses out the differing business purposes, and most critical the difference between operational and organizational challenges.

If you ask the operational part of the organization what they need, the answer is “everything”, if you ask the rest of the organization they’ll just tell you what they need to do their jobs, often just the parts to interoperate between the business areas. This means that those shared parts need to be conformed to enable collaboration, meaning it needs to be conformed between business areas. However within the operational space, the business already knows what the data means, they work with the data within the transactional systems all the time, they don’t need the data to be conformed, they need it to support their operational processes. This is why I’ve said that Operational Reporting should be the foundation of data governance, if we accept that operational requirements are not organizationally conformed then we significantly reduce our work effort.

Principle: Operational data is locally conformed, not organizationally conformed.

This means giving the business low latency data and making conformance a challenge at both the system and data layer. This means that we are focusing governance into this operational space.

Principle: The business needs to govern data from both a operational system and reporting perspective

In other words, operational data governance is not about landed data only, its about the full lifecycle of system and data, indeed it is more about the system and process side than it is about Data Quality pipelines in the data tier. We know this reality, and we also know its challenge. We’ve always said “fix at source” and indeed the myth of that is why I really started using CDC everywhere, as it enabled recalculations of history if DQ rules needed to change. But the real blocker has always been that the business didn’t see the value in fixing the data issues. This is because, thanks in part to the IT desire to globally conform data, that data quality issues only surfaced within IT. This plays back to that first principle in this post, if you can’t fix operations, then you aren’t the person who is going to be able to fix the data.

In the new world therefore we need to make the production of data for collaboration a key business KPI, rather than it being a question of business requirements to IT, it needs to be a requirement on the operational area of the business supported by IT to create the collaborative data products for the enterprise.

Principle: Business Operational Areas are accountable for producing the transactional collaborative data products for the organization

What this means is that we are effectively saying that an operational area is accountable for its data, and that the dependencies of the organization should not be on the raw data from that area, but on the business KPIs, metrics and conformed data which that part of the organization says is correct.

I was very deliberate there in the use of “transactional”, and that is because Master Data is different, it represents the pivot points in reality around which the transactional data hangs. So who governs master data, when, and where?

Here I’m going to hang back a decade to when I lived and breathed MDM, one of the things I really fought about back then about the idea that there was some “huge” golden record which is where the MDM Radar came from. The point of the Radar was to help people focus on the ‘real’ challenge of MDM, namely the core and the x-ref. The core being the minimum set of data required to uniquely identify the ‘thing’, this was the data that had to be clean to ensure identification. The cross-reference (the x-ref) is then the ability to identify that ‘unique thing’ across all of the data sets in which it resides.

Today I’d argue that the core is much less important and the challenge is identification and x-ref. With new techniques that leverage the power of big data and ML you’re often better off, in fact nearly always better off, using the transactional and interactional history to identify the unique ‘thing’ and importantly the relationships between ‘things’ (people, location, asset etc). A while back I referred to this as making MDM dance around the POLE and today with graph based databases, ML and big data I’d say identification becomes a fundamentally different challenge to mastering. Mastering focuses on quality, something that is much easier after identification. Historically the two had to go together, but today we have more options.

Principle: In modern data estates master data identification is more important than the golden record

What this means is that as long as I can associate a ‘thing’ between different data sets accurately then the actual definition of that thing is secondary. So if in system A it is “Steve Jones” but in system B it is “Steven Jones” and in system C it is “Moses Jones” (nickname at Uni, and my twitter handle) then I’m less worried about the name being conformed across those sources than being able to identify that these three people are in fact the same person. The x-ref is effectively an invisible artifact, one that includes the key relationships between different stores. This also means on stores I cannot conform (e.g. external stores) that I can still correctly associate data because I focus on identification. Validating those associations remains a master data challenge, and again the smart approach with master data is

Principle: Align master data governance to the operational processes that can validate the reality of an entity

Again, this comes down to that concept of operational control. If I can’t identify a customer properly, I need to look at what I need to do operationally to do that. If I can identify them uniquely but have some data differences, well that becomes a data quality issue not a master data issue.

These are a few principles on how we establish business governance, and that then starts helping us understand how we need to construct the technical architecture, the technical delivery and organizational structure to become data driven and in control of data.

--

--

Steve Jones
Collaborative Data Ecosystems

My job is to make exciting technology dull, because dull means it works. All opinions my own.