HOLISTIC VIEW FOR DATA-DRIVEN DECISION MAKING PROJECTS:
Using the Exploded View to Manage the Data Science Lifecycle

8 min readApr 13, 2023

In today’s data-driven world, businesses and organisations rely on data science to make critical decisions. However, with the large number of new aspects within a data science project, it can be challenging to know where to start and how to ensure the important aspects to realise a successful project are covered. In this article, we will explore how to approach projects specific to the data science lifecycle with the Exploded View model, from defining the problem to deploying the solution and under consideration of a holistic-organisational view. By following these guidelines, you can ensure that your data science projects are successful and deliver value to your business.

The Exploded View model ensures that the deeper contexts and challenges of digitalisation are made visible and comprehensible for all stakeholders in the company. It allows us to discuss, address, and agree on specific tasks within our data science project. But other tools are used to create a detailed view: e.g., Data Value Proposition Canvas, ROI Assessment, or the Standard Process for the Data Science Lifecycle, etc. The Exploded View is a great tool that helps us to model a holistic view for our data-driven use cases along all aspects within the company in six layers:

If you are interested in the Exploded View, you will find more information in the white paper.

While the first three layers are unique for each use case and organisation, there are general aspects (entities) of a data science project for the other three layers that provide a guidance to approach such use cases. We will discover these aspects in the following. The figure below shows an overview while the single entities will be explained in more detail afterwards.

Holistic View for Data-Driven Decision Making Projects

Customer, Experience & Organisational Layer

Within an organisation it is important to drive projects based on the underlaying business model to ensure innovation will be enabled where it is needed. To do so, we make use of the magic triangle of business models as a variation of the the St. Gallen Business Model Navigator to describe and balance the business constrains and ensure a high-quality outcome of the project. The first dimension describes the client (stakeholders) to understand their goals and objectives, while the other three dimensions show the fundamental value aspects of the business model for our stakeholders.

1. Who is our client?
In the context of a project within a company the clients are the stakeholders. The goal is to understand and define their needs and issues.
2. What is the effect on the customers?
Since one key aspect of our approach is to think customer centric, we need to understand the effect of the use case on them. Customer refers to the actual customer of the company and not to the client of the project (stakeholders). By doing so, we identify the value proposition right at the root/source.
3. Which factors drive the value?
The delivery of the value proposition is realised by certain factors (processes and activities). Identifying these aspects within the processes of our company, enables us to understand how the innovation of the use case drives our business.
4. What is the effect on the financial performance?
Finally, cost structure and revenue streams are described to explain how the financial performance is affected by the business model. In essence, it points to the elementary question of any use case, namely how money is generated.

Performance Layer

As described, the performance layer gives an overview of all relevant processes that we need to consider for a data science use case. In the following, we will look at the three most important ones (Data Value Story, ROI Assessment, Data Science Lifecycle standard process), knowing that the list is not limited to those.

Elevator Pitch Data Value Story [Jens Linden]

Approaching any given use case from a business perspective, the most important question to answer is what the expected value is. We can define Data Value Story tools like the Data Value Proposition Canvas to clarify the following questions:
· What are the gains and pains?
· What is the target, what are the results?
· What is the desired data product/service?
· How does the data product/service relieve the pains or create the gains?

Besides the general value of a use case, assessing the Return on Investment (ROI) of a use case is crucial from an economic perspective. We need to consider three main areas: desirability, feasibility, and viability. Desirability involves determining the potential impact of the idea, including cost savings, revenue generation, new business opportunities, or improving customer experience, as well as the strategic importance and urgency of the use case. Feasibility entails evaluating the scale of the use case, along with the level of effort required for implementation (including monetary costs and time investment), system readiness, required expertise, data availability, and legal considerations. Finally, viability involves examining the impact value in terms of EUR, and the effort required to implement and maintain the product. By considering these three areas, businesses can better understand the potential value and feasibility of a use case, and make informed decisions about whether or not to pursue it.

The standard process for the Data Science Lifecycle is a method-based and goal-oriented approach for logical reasoning, traceability, and reproducibility. The process defines a step-by-step guide how to approach any give problem with a data-driven decision making (DDDM) methodology. While the process provides a comprehensive collection of aspects, it is not limited to the ones listed. It covers general aspects such as framing the problem from a business perspective, as well as statistical aspects.
Data science use cases are distinct from other software solutions. One reason is the difficulty in upfront assessment of whether the data-driven solution meets the desired goal requirements. This is because the capability of the solution depends on the given data and the ability to extract patterns using statistical methods. To address this, a proof of concept (PoC) as a minimum valuable product (MVP) is recommended to assess the solution’s capabilities with minimal effort and time. The PoC scope is typically limited to evaluating the ability to extract the required patterns with the given data to come up with a robust decision. For some use cases, additional topics like legal aspects or infrastructure should be included. With a positive outcome of the PoC, the use case can be deployed and set for productionisation to deliver value.

The first six steps of the Data Science Lifecycle can be divided into two parts (see violet-coloured boxes in the figure above) that build upon each other:
Part A: Understanding the problem from a business perspective & understanding the problem from a statistical (data analytical) perspective.
Based on the gained insights of part A, a method-based transition from understanding to solving the problem can be achieved in part B.
Part B: Deriving and evaluating use case specific statistical methods to solve the problem.

The following 8 steps provide a full picture to realise a data science project:
1. Business understanding: Define the project’s goals and objectives from a business perspective, capture ROI potential, identify risks, and formulate a concrete question that can be answered with data. We already learned about some tools like the Data Value Story and the Data Value Proposition Canvas that support this process.
2. Data collection: Obtain the right data by considering aspects such as data availability, accessibility, legal obligations, and technical considerations. This determines if we can even tackle the project with a data-driven approach.
3. Data understanding: Gain a better understanding of the data to determine whether it fits the business requirements. Explore the data through exploratory data analysis (EDA) and identify suitable statistical methods to address the problem. Based on those findings, we can achieve a method-based transition from understanding to solving the problem by identifying suitable statistical methods.
4. Data pre-processing: Pre-process the data to support data understanding, clean the data, or prepare it for applying analytical methods or statistical models. Proper data structures are crucial to improve efficiency and quality.
5. Modeling: Select the appropriate modeling techniques, build, and validate models, and assess their effectiveness. Short-list the best models among multiple models with individual strengths.
6. Evaluation & benchmarking: Evaluate the model’s performance, select a suitable evaluation metric, and analyse how robust the results are.
7. Deployment: Develop a plan to integrate the solution into the organisation’s operations, define what productionisation means for the use case, establish risk tiers, validate code quality and security standards, and apply general software engineering concepts to data science projects like DevOps, MLOps, and software testing.
8. Reporting & monitoring: Monitor the deployed system’s performance to ensure it continues to meet the project’s goals and objectives. Adjust the model if necessary to maintain its effectiveness over time.

Asset Layer & Data Layer

The asset layer describes all resources that need to be considered for a use case. For our data product, we are mainly talking about IT infrastructure, systems, and architectures. In contrast, the data layer covers the sum of all data and data structures. Below we look at those two layers combined from a data science perspective.

There are multiple ways to approach data science use cases which are grouped here into three types, each with different proposes and levels of maturity. The figure below gives an overview of the main aspects of those types.

The Data Science Lab is great for PoCs or single use cases. It is mostly used for fast experimentation or productionisation of single use cases with high flexibility.
A Data Science Hub provides standards and processes to enable teams to work together efficiently and to integrate the solutions into the existing IT landscape.
While a Data Science Platform provides great capabilities for multiple use cases, especially over multiple departments, it focuses on building a platform cross-application wide with a core service provider approach. Best practices, standardised tools, services, and templates enable high productivity.

When talking about the different data and data science architectures for those three groups, we mean the system’s components, their properties and how those interact with each other. Data science components are software tools and frameworks like Docker, Scikit-Learn or Spark. Data components refer to infrastructure like a data warehouse or lakehouse to store and provision data.

Summary

In this post, we discussed the importance of the data science lifecycle for successfully approaching data-driven decision making projects in businesses and organisations. We discovered the Exploded View methodology as a holistic framework that guides data practitioners through the entire process, from defining the problem to deploying the solution.

HOLISTIC VIEW FOR DATA-DRIVEN DECISION MAKING PROJECTS:Using the Exploded View to Manage the Data Science Lifecycle

Customer, Experience & Organisational Layer

Performance Layer

Asset Layer & Data Layer

Summary

Written by David Tiefenthaler

HOLISTIC VIEW FOR DATA-DRIVEN DECISION MAKING PROJECTS:
Using the Exploded View to Manage the Data Science Lifecycle