Towards Universal Data Supply

An introduction to my work that might also help you to better understand our job as data engineer and architect

Bernd Wessely
6 min readSep 4, 2024

I started my Medium membership in April 2024 and have since written articles that may look like random reports on partial aspects of our data engineering world.

But I actually had the main topic of enabling the universal data supply in mind. I wanted to begin with detailed information about particular subtopics that seemed interesting on their own, but also contribute to the understanding of universal data supply.

In my professional career as a data architect, engineer, consultant and entrepreneur, I’ve had to explain time and again the complex architectures we build to fulfill what looks so simple at a high level. We need to enable all applications in the company to access available information from all other applications whenever it is needed, for whatever purpose and at whatever level of detail.

That is why there is this common thread of universal data supply that runs through all the articles. It’s therefore also an introduction to my work to date.

Data and Logic

The fundamental distinction between data and application logic seems obvious and helps to structure a system at the highest level. Even if we only had one single application in the enterprise, our challenge of universal data supply wouldn‘t be much simpler. However, it can quickly become overly complex if we distribute and scale out but forget about simplicity and composability. If we start from straightforward setups and always think twice before moving away from them, we can build simpler systems.

The fundamental distinction can also guide us to understand which concerns a data engineer should focus on and how we cope with the plethora of tools that the data engineering industry is constantly throwing on the market.

It helps us to recognize that all systems need to be developed by skilled software engineers — regardless of whether application logic or data is concerned.

I‘m convinced that the separation of data and logic is a good concept. However, when we separate parts, we should also think about how we can best manage the dependencies between those parts. Unfortunately, we still don’t have a good common system for this, and we need to clearly delineate the responsibilities between data and application engineering.

Architecture

It‘s essential to find the right overall architecture to best counter the complexity devil, which begins with the division into ever smaller system components. There is an undeniable advantage to splitting huge tasks into distributed and self-sufficient components. But without the right architecture, we will move further and further away from the theoretical ideal of having a single, clean and modular application that covers all business requirements in a coherent way.

An overall strategy based on the fundamental distinction of application and data concerns helps to prevent getting side-tracked. Applications need to intensively exchange data and therefore participate as producers and consumers in an enterprise-wide data mesh. A data mesh that interconnects all applications via a common infrastructure, allowing a truly distributed implementation within the enterprise. It aims to treat data as a first-class citizen that can be seamlessly integrated into business processes.

Challenges and Solutions in Data Mesh

3 stories

Traditional approaches such as the Data Warehouse, which focus biasedly on the analytical use of data, must therefore be redefined to truly support the seamless flow of data in the enterprise.

In a world of highly distributed applications, any centralized and rigidly layered data approach seems to be doomed to fail.

We need to recognize that an enterprise architecture to enable universal data supply cannot be based on the approach of product vendors who design their platforms as sellable products.

This naive thinking in specialized platforms has led us to isolated disciplines. I firmly believe it‘s time to rediscover the great common ground that we have lost through overspecialization.

Isolated disciplines created isolated data realms. We now face the great divide of data and need to close it again.

The management and modeling of data has it’s own challenges that differ significantly from the challenges of managing and modeling applications.

Modeling data in the enterprise is paramount to prevent data chaos. We need to align the top-down high-level views with application-specific details to provide a coherent und up-to-date view of our data across the enterprise.

Data modeling is not only used as a blueprint and structure for data to be stored. It also helps to derive important insights from data using analytical applications.

To ensure that data is accessible to every application in the enterprise, regardless of how much delay or latency the application can tolerate, it’s crucial to unify the two data processing styles batch and streaming.

Business

Data management and the modern data stack (MDS) have no end in itself. These are all just tools to enable the business ideas and processes.

The single most important task data engineering has to solve is to supply every idea and every process in the enterprise with the best possible data available.

We should let go of the belief that we only need to mine the big data collections (also called data warehouse or data lakehouse) in our company to be more successful. We should instead focus on the business ideas and processes and supply them with the right information. We need to empower business with universal data supply.

How do you think about our challenge to develop data systems that are reliable, easy to change, and scale? Maybe it’s not even a good idea to separate application logic and data the way we do?

I’d love to hear about your opinions.

--

--

Bernd Wessely
Bernd Wessely

Written by Bernd Wessely

Data Engineer, Architect, Consultant and Entrepreneur

Responses (1)