Domain-Driven Design: Translating it to Data Products in Real Life

Kim Thies
ProfitOptics
Published in
8 min readSep 18, 2023

TL;DR — DDD is a software engineering approach that focuses on understanding the business domain and modeling it in software. It can be applied to data architecture by decomposing data sets into domains and assigning specific business value that saves data scientists and engineers time, while improving the quality, relevance, and usability of data products as assets.

When we build data products using a common language (like the standardization of Legos here), bounding ‘like’ work to those who know the specialized work best (e.g. data products), we create better ability to scale rapidly, reuse data, gain more insights and collaborate on new levels. Photo by Amélie Mourichon on Unsplash

It was the early 2010’s, and I had become a bit obsessed with YouTube on my first iPhone. I still remember sitting watching Eric Evans talking about “Domain-Driven Design” (DDD), which prompted me to get the book. As a person who came into technology roles from a business consulting background, it made total sense to me that understanding and capturing the domain knowledge, or the business context, is the heart of any successful software project. It created a way of thinking about the building blocks of software in usable, interconnected products.

In the world of data architecture, however, thinking about the smallest, domain-base components is not the world. In the land of data lakes, warehouses, lake houses and marts embracing domain-driven design and microservices approaches remains rare, though gaining traction. And so, when we had the opportunity to test out domain-based data products at a global fintech via a data mesh implementation, we did just that.

Here are two major ways we learned where DDD can be applied to the Data as a Product concept.

1) Your business data users know the data best.

2) Bounding business context reduces complexity.

Let’s look at those ideas/concepts in details…

1) Business Data Users Know the Data Best

DDD: Aligning to the Users through Domain Models

DDD says that the domain model should be the heart of the software development process. This means that everyone involved in the project, from developers to domain experts to business users, should use a common (or “ubiquitous” as defined by Evans) language to talk about the domain. This language should be as specific and unambiguous as possible, so that everyone is on the same page and there’s no confusion.

DDD also teaches engineers that the software system should be designed to reflect the complexity and richness of the business domain. This means that the system should be able to handle all the different ways that the domain can be used. It should also be easy to understand and maintain, so that it can be updated as the domain evolves.

DDD is a powerful approach to software development that can help you create systems that are more successful and impactful. It’s a bit more complex than other approaches, but it’s worth the effort if you want to create a system that meets the needs of your users.

Data Application: Organizing by Domains

In data mesh design, we can apply the same concept of bounded contexts from DDD. A data domain is like a self-contained unit that’s responsible for storing, processing, and governing analytic data sets. Each data domain is aligned to a specific business capability, like customer segmentation or finance. It has its own dedicated team that owns and is accountable for the data.

This means that the people who are using the data to do things like forecast, build models, or understand profitability levers are the ones who know the data best. They’re the ones who should be responsible for managing it. This is in contrast to the traditional approach, where data is often managed by a central team of engineers.

The data mesh approach gives the people who are using the data more control over it. This can lead to better data quality, more agility, and faster time to insights.

Real-Life Scenario

When we build a domain-based data solution, we don’t just focus on one domain at a time. We also look at how the different domains in an organization interact with each other. For example, a holistic customer record might include data from several different domains, like segmentation, historic buying patterns, and ID across products.

We work with the data users to understand how they think about data and how they want to use it. Then, we build data domains that meet their needs. We also create interoperability between domains so that data can flow freely between them. This allows our customers to get never-before-seen insights into their data.

In one implementation, we had a very complex business problem. The data was spread across multiple business functions, and each function had a different definition of a key metric. We wanted to create a holistic view of the data that would show the total customer view.

We achieved this by focusing on the interoperable product. This meant creating a data domain and its associated metrics that could be understood and used by all of the business functions. We also worked closely with the business to understand their needs and to empower our data experts to organize the data in a meaningful way.

The result is a holistic view of the data that allows the customer to make better decisions. Executives and data scientists alike are able to see trends and patterns that they had never seen before to improve their products and services.

2) Bounded Business Context Reduces Complexity

In Domain-Driven Design (DDD), a bounded context is a conceptual area of focus within a software system. It’s like an imaginary line that divides the system into smaller, more manageable parts. This helps teams work independently on different parts of the system while still keeping everything consistent and integrated.

Each bounded context has its own vocabulary, concepts, and rules. This helps to ensure that everyone on the team is working with the same understanding of the domain. It also helps to avoid confusion and errors.

Bounded contexts are often aligned with business capabilities. For example, a customer management bounded context might focus on the concepts of customers, orders, and invoices. A product management bounded context might focus on the concepts of products, features, and pricing.

The boundaries between bounded contexts are not always clear-cut. There may be overlap between them, and there may be need for communication and coordination between them. However, by clearly defining the boundaries between bounded contexts, we can make it easier to develop and maintain a complex software system.

Data Application: Organize Data as Assets

Data products are similar to bounded contexts, but they’re focused on data. A data product is a well-defined and self-describing data asset that serves a specific business need. It has business context and relevance, and it’s used by data scientists to build machine learning and AI models.

Data products are the building blocks of modern data solutions. To create robust, maintainable, and domain-centric data solutions, we need to model data products to capture the required attributes and business context, establish quality expectations, and ensure access controls and governance are in place.

Without a data product view, data scientists are left to fend for themselves, spending hours finding data, often via means of simply asking around an organization. They have to search through data sets that don’t have any business context, and they have to guess which data attributes are relevant. They also can’t give feedback to the data providers about the validity or use of the data without going through a lot of red tape.

This means that data scientists are wasting a lot of time transforming and preparing data for single-use cases. The industry average is estimated at 80% of analytic time lost to finding, cleaning and validating data. Imagine how many more insights they could derive if they had a data product view that gave them the context they need and the ability to collaborate with the data providers.

A data product view is a way of organizing data around a specific business need. It’s like a map that shows the data scientist where to find the information they need. It also includes the business context for the data, so the data scientist doesn’t have to guess what it means. They can give feedback on the data product in a continuous improvement loop, which is a powerful tool to ensure accuracy and usability of data and the metrics and definitions of their use as the business constantly evolves.

Applying the concept of bounded contexts to data products is a powerful way to create data solutions that are easy to find, understand, maintain, and evolve.By decomposing data assets into manageable data sets with relevant business context for specific business use, we are able to gain greater use of the data asset. These data products are more valuable, relevant and also feature observability functions to verify quality. The benefits of the creating data products include:

  • Improved data quality: By decomposing data assets into smaller, more manageable units, it is easier to ensure the quality of the data.
  • Increased data relevance: By aligning data assets with specific business capabilities, it is easier to ensure that the data is relevant to the needs of the business.
  • Improved data usability: By providing self-service tools for accessing and using data, it is easier for users to get the data they need.
  • Increased agility: By decentralizing the ownership of data, it is easier for teams to innovate and experiment with new data-driven solutions.

Real-Life Scenario

One example of the application of data products comes from leading a team to create a solution to redefine how risk was viewed across an enterprise. We wanted to move away from a single-lens, loss-focused model to a more holistic profitability view that took into account all the different domains (risk, finance, sales, product, etc.).

As we built the solution, we needed to make sure that the context for the analytic data was captured and interoperable. This meant that we had to create a common language for talking about risk that could be understood by everyone involved. We also had to make sure that the data was structured in a way that could be easily shared and used by the different domain teams.

In our case, we chose to establish data mesh as proof of concept, or test, to see if we could separately define data products unique to each domain that could then be brought together for a holistic view of profitability. The solution allowed us to create reusable analytic views of data that could be used interchangeably — connecting and moving different products kind of like Legos, each time creating a new view or lens of profitability. We made the data products easily accessible through self-service tools for data scientists and data engineers alike.

Data mesh is just one such example of bringing product-thinking to the data field — it may not be the ‘bulls eye’ solution for every company (it is dependent on complexity, data culture and executive buy in), but it’s a great place to start as we embrace experimentation and exploration of these concepts.

--

--

Kim Thies
ProfitOptics

Entrepreneur, CEO, Data Leader, Mom, Travel Junky