Tackling Complexity at the Heart of Data Platforms: Connecting the Operational and Analytical Data Planes

Martin Chesbrough
EverestEngineering
Published in
10 min readJun 11, 2024

Introduction

Data platforms are designed to fulfil the analytical needs of the organisation as a whole. These needs derive from the operational processes and systems, therefore the analytical data plane (which is often represented by a number of data platforms) needs to both represent and cover the full spectrum of the operational data plane. If the operational data plane is represented by multiple business domains, with seams where the domains intersect then this represents a good starting point for the analytical data plane.

This approach leads to a need to re-think the humble data warehouse and re-imagine it through a DDD (Domain-Driven Design) lens as a platform designed to connect operational and analytical data planes, across multiple domains and negotiating the boundaries of these domains.

Note: the use of the term data plane is explained further down in this article.

The Problem

Many data projects have challenges with understanding the operational world in order to represent it through analytical data, so as to derive insights and help organisations improve. The challenges tend to start with trying to reconcile data from different source systems and make sense of all the data collected, much of which has been “dumped” into data warehouses as a copy of the source data. I call this the Data Exhaust Pipe (partially in reference to the desire to collect everything but also in reference to the fact that understanding it means going back to the ERP, SaaS application or website from where it was collected to understand its structure and meaning). I believe treating data as a Data Exhaust Pipe is the source of many of our problems in the data world.

This post aims to dive into this problem from the perspective of the operational systems that provide the data and I aim to provide some thinking and direction for building better data and analytical projects that harness a more natural connection from the operational world to the analytical world.

From Data Exhaust Pipe to Data Planes

When a new data team is formed or a new data initiative starts this may be the first point that you start to tap into the data exhaust pipe. Until the data team or initiative starts it was just copies of source database tables sitting in a database waiting to be queried. You did this because you suspected/knew/hoped/dreamt that there would be insights in those tables to will help you make better decisions or feed better actions or deliver better operational outcomes. You never knew you were creating a data exhaust pipe. How could creating a copy of your SAP tables be seen as waste?

I make this point because I want to assert right from the beginning that the operational data and analytical data planes are connected right from day one and they deserve to be connected.

But what’s this operational and analytical data plane terminology?

More geek speak?

Well yes, the term “data plane” is a term derived from network management where traffic is separated into a control plane and a data plane to dynamically route traffic through the network. In the context of analytical data it is useful because you can think of there being two types of data floating around your company:

  • the transactional data is what I refer to as the operational data plane and this is the data that ensure your systems work and you can serve customers; and
  • the analytical data plane is the information that circulates around the organisation that provides analytical value.

It is often similar, or even the same, data but used for different purposes and often having slightly different formats because of its use case. There is no control plane in this analogy and it is not a perfect analogy but I hope that it serves to envisage separate, but interrelated, data planes that can be linked at the domain level as per my initial sketch above.

  1. The interrelated data planes model divides up a single data exhaust pipe (like the ETL or ELT data pipelines that feed a data warehouse) into separate connections that tend to be grouped around domains. This is, in part, a mental trick to separate the data processing along domain lines rather than extract-load-transform lines. It is kind of like the vertical slicing approach in application development, as opposed to the horizontal slicing across front-end and back-end.
  2. It helps in aligning the whole organisation towards a domain-oriented view. Often the data team is working across domains where the problem can be that they do not recognise the boundaries and terminology may confuse. There may be mis-alignment, between operational and analytical data, and this now becomes easier to spot. We will explore this more in the next section.
  3. It also promotes a more active and direct feedback loop between the analytical and operational data. Direct feedback loops are often shorter and provide more timely and useful help to teams when they are in charge of their own destiny (the ideal of a self-managing organisation). There is a parallel here in the advancement of management approaches from more centralised towards more distributed management.

So, if the idea of connected operational and analytical data planes is a good idea then how do they connect?

Data Domains and DDD

I think the key to connection is aligned data domains using DDD. These may not be the exact same domain boundaries but DDD provides a clear understanding of where there is alignment and where there is not.

Domain-Driven Design, known as DDD, is a software development technique that emerged almost in parallel with agile development. Eric Evans published his DDD book in 2003 but it emerged from the 4 previous years of collaborating with Agile and XP developers. The core of DDD is about model-driven design and development, i.e. building software from models. This core is important to our discussion, not because DDD domain models and data models are necessarily the same (they are not) but because the thinking behind data model driven analytical development and DDD is similar.

DDD continues today as one of the core technical concepts that Agile and XP enthusiasts use to help tackle complex programs of work. To me, DDD emerges as a unifying concept that brings together Agile Development, Complex Software Landscapes and Analytical Data. This is why it is important for data engineers to learn DDD.

From the early 1990s, through the 2000s until recently, data warehousing, data lakes, big data, Hadoop and the associated move to cloud grew in a different direction. A more infrastructure and technical knowledge base developed separately from agile. I understand that there are many books on Agile Data Warehouse Design, Agile BI and Agile Data but, in general direction, I will argue that the two ideas (agile and data) have stayed resolutely separate for the past 20 years.

Interestingly though, both schools of thought reference the idea of a domain.

In data warehousing domains are associated with different areas of the business from a data analysis perspective. In data it is common for areas like sales, finance, logistics, product, marketing and customer service to be the data domains of interest. Most typically when data engineers are thinking of building subject-oriented data marts to handle analytics for these business areas.

DDD does not naturally seem like a fit with data domains. The work of a DDD expert to identify aggregates, to practice event-sourcing and implement CQRS feel like activities that operate on a much lower level than the data architect designing data marts to implement BI dashboards for end-user analytics.

Surely the domains and sub-domains in DDD are finer grained than data domains?

But they are not. Let’s see how we can build the connection …

Using Event Storming to build a connection

The connection is Domain Events.

The concept of an Event is key to DDD. So much so that the term has spawned techniques like Event Sourcing and Event Storming.

It is also key to data analytics although we sometimes struggle to see the significance.

Ralph Kimball first alerted us to their importance when he designed the Business Matrix. Effectively the steps in a business process that would be reported on in the analytical data layer.

Lawrence Corr then formalised the language for us when he introduced BEAM, Business Event Analysis and Modeling.

Data science and machine learning often delivers business value from its ability to predict certain events.

So, if events are critical to both our worlds is this the place to start?

I argue that event-storming is the best technique that has surfaced from the world of DDD that is a natural fit to the data space. And it will help bridge the gap to the application developers and architects.

What is Event Storming?

As far as I know it is the brainchild of Alberto Brandolini and the image I have is of Alberto with a massive long (15 to 20 metres) roll of paper and loads of sticky notes turning up to run a workshop where everyone gathers to collaborate on creating a shared understanding of the business or system they are working on.

I wont go into details (you can read Alberto’s book) but one of the valuable outputs is a list of Domain Events together with all the different actions that happen to those domain events plus the boundaries that form the domain model. An example is shown below:

Personally, I am a big fan of the end-to-end process workshop where everyone collaborates to deliver a shared understanding of what we are all working on (the business, the organisation, or just a complex process). I have done it in various forms, starting with Value Stream Mapping in my Lean Six Sigma consulting days, through to my more recent exposure to Event-Storming.

Why Event-Storming?

My argument for Event-Storming as the workshop to create the connection point between operational data and analytical data comes from 3 recent experiences.

In the first one there was a previous event-storming exercise done but many people had left and the expertise had left the organisation. You could see the paper and stickies on the wall as a left-over relic from a previous world (this was in the midst of the pandemic) and you could see the connections between what people worked on in the operational world and what was happening in data analytics. Confusion was already setting in at the organisational level and the analytical data plane was getting more and more disconnected from the operational plane as time moved on.

Contrast this with the second example. An organisation that saw the opportunity to rethink operations and analytics, conducted an event-storming workshop and you could see how the software developers in the application space and the data team could now collaborate better. It felt like the organisation had a basis for collaboration.

My final example is an organisation where operations and analytics is badly disconnected. The dashboards show that 50% of the website traffic data is “unknown” and the reason being that the marketing team and the analytics team did not collaborate on changes made over the years. The effort to fix this “bad data” and to generate useful reports and dashboards is painful. Lots of meetings were held where small segments of the end-to-end flow are discussed. Incremental improvements are made. But the process is slow and painful.

Distributed Domain Ownership of Data

One of the side-effects of an event-storming workshop where the analytical folks are present is the collaboration that ensues. Suddenly the application developers, the operational folks and the analytical users are in the same room as the data team. In a well-run, psychologically safe workshop magic happens and the awareness builds that high quality data is needed for analytics.

Discussions ensue on how to allow for rigorous testing of a payment gateway, while at the same time clearly identifying the test data that needs to be propogated into the analytical data plane for their purposes.

It has been my experience in organisations of all sizes that a well-run psychologically safe and productive workshop has a force multiplier effect on cross organisational collaboration. In this case it is event-storming and the result can be better connected operational and analytical data planes, an increased awareness of domain data ownership and a more productive data analytics environment.

Isn’t this Data Mesh?

Well yes it is — if you have read this far then I’d like to acknowledge Zhamak Dehghani as the person who opened my eyes to this concept. I am a big big fan of her work.

So why am I only mentioning the term Data Mesh at the end of this post?

The reason, in my mind, is that I don’t see Data Mesh as the purpose of this post. I used the word data platform to refer to the concept of a functional building block that delivers data services to a data product layer. I used data warehouse to refer to the most likely types of platform that exist in your organisation. I think the inevitable direction that a DDD-oriented view of operational and data planes takes you is towards a Data Mesh but my intent is not to prescribe the end state, merely to offer you suggestions to help you proceed in the right direction.

For many of the organisations I work with the biggest challenge is connecting operational and analytical data together so, if this is also your problem, then I hope my suggestion of a small step in the right direction will help.

Conclusion

This post has explored the integration of operational and analytical data planes within data platforms, emphasizing the importance of aligning these planes to effectively derive insights and improve organizational performance.

The “data exhaust pipe,” exists when data from various sources is often dumped into data warehouses without proper understanding or integration. I am suggesting transitioning from this approach to a framework where operational and analytical data planes are interconnected from the outset, using concepts from Domain-Driven Design (DDD) to align data domains, and Event Storming as a means to bridge the gap between operational and analytical teams, fostering collaboration and domain ownership of data.

--

--

Martin Chesbrough
EverestEngineering

Currently on internship with Everest Engineering where he is learning about software product development and socio-technical systems thinking