Some shades of grey regarding data & analytics

Introduction

In my years as an Enterprise Architect for Data & Analytics I encounter quite often an absolute approach / black-and-white thinking about topics related to data governance and data management, data architecture and using data through BI and analytics to enable quality decision making. The impact for an organization is that the polarizing character of an absolute approach results in undesired opposition, that can be avoided in numerous cases. To provide some shades of grey, I am writing three short essays on different topics that bother me and need some clarity. These topics are:

  1. Making data available by data providers within large organisation
    When business domains only want to provide their data one time, the most extensive requirements from the use cases that need to be supported determine the size of the work of doing so. What is the root cause behind this behaviour and how could this be tackled in different ways?
  2. Discussion on data architecture are often dominated by the hype of the day, instead of what actual is possible when looking at the organizational capabilities. Today, when I talk about data architecture it is often about the data mesh and / or data fabric. What do we do when an organization is not ready for these new approaches to the data architecture? How to develop organizational capabilities to become mature enough?
  3. The lack of attention for logical data modelling when talking about data
    Within the organizations I have worked for in the past decade, the attention for proper data modelling is decreasing. This impacts the ability to create value out of data, due to missing insight into the meaning of data and how datasets relate to each other. How should large organizations approach this topic?

These essays reflect my personal opinion. I am writing these down to contribute to the growth of the data & analytics community from a true enterprise architecture perspective.

1. Making data available by data providers within large organisations

In the last twenty-five years I have worked for many large corporates in the financial industry. Finding the right data for specific use cases and making that data available by the data provider is not new for these organizations. In the past two decades business domains within large corporates worked on making their data available for use in data warehouses and data marts to support analytics and BI

It is important to recognize that making data available is more than sending a backup of an application to the requesting business domain. In order to enable the consuming domain to do their work properly, they need to receive the functional relevant data, including proper metadata to describe the contents like semantics, relations to other datasets, the right identifiers to support joining the data with other datasets etc.

What also is essential that business domains provide consistency of their data over time when applications are changed. The distance between the creation of data in a loan administration and the use of that data in an aggregated way for e.g. risk modelling within a bank can be quite big. The risk modelling team does require that data is consistent over time. When applications change, the data provider must be concerned by providing a mapping of the old dataset to the new changed dataset of the application in question.

To summarize, it is not easy to make data available within large corporates.

The desire to deliver data only once…

Business domains within large corporates work together to realize the corporate strategy. However, in practice KPIs that are used to steer the business domains are not always aligned. The result can be possible conflicts between business domains about the need to make data available from the data provider to the data consumer. These conflicts tend to lead that data is made available on a minimum level and the responsible business domain does not want to be bothered with any other topics. They want to deliver the data once and then go on with their own business as usual. the problem that exists here, is that the data that is made available is not always fit-for-purpose for all use cases in the different consuming domains that require the data. When a data provider wants to deliver the data only once for all use cases, the data delivery needs to comply with the requirements of the biggest, most complex use case.

The result is that the work of data provisioning is complex, expensive and takes up a lot of time. use cases that need to be tackled today and require data, cannot be served. When a rigid approach is chosen here, business domains tend to take compensating measures, including possible development of shadow IT and other undesirable data integration from an architecture perspective. What is needed is more flexible capabilities to be offered by the data architecture.

Different approaches to make data available

The situation that I described above is caused primarily by a lack of flexibility and capabilities in making data available. An enterprise data architecture needs to include capabilities that make data available in multiple ways that align with each other but exist next to each other. What capabilities am I talking about?

  • Direct query access to a data source through data virtualization
    Although numerous people talk about virtualizing data access, large corporates lack an independent data virtualization solution and associated positioning with the data architecture to enable business domains to provide access to data in a quick manner. Not to replace current streams or batches in which data is provided, but to enable flexibility.
  • Supporting multiple versions of data delivery through concepts like the medallion architecture pattern
    In the data architecture today, data provisioning can be implemented by supporting multiple versions of data, similar to the phases of the medallion architecture. Datasets can be classified as bronze, silver or gold and exist besides each other to enable fit-for-purpose data delivery.
  • Tackle the attitude of people within the organization by addressing cultural aspects / data literacy
    Making data available, including the proper metadata, is a lot of work, when a business domains has failed to execute proper data management of the lifecycle of data within their applications. When logical models of an application are available, including information on the context of the data, semantics enabling interpretation and identifying characteristics, making data available can be executed quite fast.

Concluding remarks

There is no silver bullet to tackle this topic. However, the main lesson for me is that we always must be aware of the possibility to be too clear cut and not give enough attention to the shades of grey that do exist.

--

--