Creating Measurable Value with Data Science

Francis Gichere
Coinmonks
10 min readJan 11, 2024

--

I’m currently reading a book that explores the most challenging aspects of data science. As I progress through the material, I’ll be taking notes to document and consolidate my learning on navigating the complexities inherent in the journey of data science.

Data Mesh Architecture by Data Mesh Architecture (datamesh-architecture.com)

Over the past two decades, data science (DS) has grown from a niche field employed only by top Silicon Valley tech companies into a widespread capability present across organizations and industries globally. However, many teams still struggle to demonstrate the tangible value data science provides their business. This begs the question — what is the real value of data science to an organization? Data scientists at all levels wrestle with articulating their impact, so its unsurprising businesses do as well. To become better data scientists and maximize our value, it’s important we delineate some core principles of how data science creates organizational value. By internalizing these principles, we can better communicate our worth and focus our efforts on the highest impact work.

What Is Value?

Companies exist to create value to shareholders, customers, and employees (and hopefully society as a whole). Naturally, shareholders expect to gain a return on their investment, relative to other alternatives. Customers derive value from the consumption of the product and expect this to be at least as large as the price they paid. In principle, all teams and functions ought to contribute in some measurable way to the process of value creation, but in many cases quantifying this is far from obvious.

DS is not foreign to this lack of measurability. a general approach to value creation with data is simple: data by itself creates no value. The value is derived from the quality of the decisions that are made with it. At a first level, you describe the current and past state of the company. This is usually done with traditional business intelligence (BI) tools such as dash‐ boards and reports. With machine learning (ML), you can make predictions about the future state and attempt to circumvent the uncertainty that makes the decision process considerably harder. The summit is reached if you can automate and optimize some part of the decision process.

It boils down to the same principle: incremental value comes from improving an organization’s decision-making capabilities. For this, you really need to understand the business problem at hand (what), think hard about the levers (so what), and be proactive about it (now what).

What: Understanding the Business

Data scientist ought to be as knowledgeable about the business as their stakeholders. And by business, I mean everything, from the operational stuff, like understanding and proposing new metrics and levers that their stakeholders can pull to impact them, to the underlying economic and psychological factors that underly the business (e.g., what drives the consumer to purchase or use your organization’s product).

Sounds like a lot to learn for a data scientist, especially since you need to keep updating your knowledge on the ever-evolving technical toolkit. Do you really have to do it? Can’t you just specialize on the technical (and fun) part of the algorithms, tech stack, and data, and let the stakeholders specialize on their (less fun) thing?

The first claim is that the business is fun! But even if you don’t find it exhilarating, if data scientists want to get their voices heard by the actual decision-makers, it is absolutely necessary to gain their stakeholders’ respect. Before moving on, let me emphasize that data scientists are rarely the actual decision makers on business strategy and tactics: it’s the stakeholders, be it sales team, production, marketing, finance, product, or any other team in the company.

How do you understand the business? here is how:

  • Attend non-technical meetings.

No textbook will teach you the nuts and bolts of the business; you really have to be there and learn from the collective knowledge in your organization.

  • Get a seat with the decision-makers.

Ensure that you’re in the meetings where decisions are made. For example, how can you come up with great features for your models if you don’t understand the intricacies of the business?

  • Learn the Key Performance Indicators (KPIs).

Data scientists have one advantage over the rest of the organization: they own the data and are constantly asked to calculate and present the key metrics of the team. So, you must learn the key metrics. Sounds obvious, but many data scientists think this is boring, and since they don’t own the metric in the sense that they’re most likely not responsible for attaining a target they are happy to delegate this to their stakeholders. Moreover, data scientists ought to be experts at metrics design.

  • Be curious and open about it.

Data scientists ought to embrace curiosity. By this I mean not being shy about asking questions and challenging the set of accepted facts in the organization. Many data scientists lack this overall sense of curiosity. The good thing is that this can be learned.

  • Decentralized structures.

This may not be up to you (or your manager or your manager’s manager), but companies where data science is embedded into teams allow for business specialization (and trust and other positive externalities). Decentralized data science structure organizations have teams with people from different backgrounds (data scientists, business analysts, engineers, product, and the like) and are great at making everyone experts on their topic. On the contrary, centralized organizations where a group of “experts” act as consultants to the whole company also have advantages, but gaining the necessary level of business expertise is not one of them.

So What: The Gist of Value Creation in DS

Why is your project important to the company? Why should anyone care about your analysis or model? More importantly, what actions are derived from it? This is at the crux of the problem covered in this chapter and considered one of those seniority-defining attributes in DS.

A mistake usually repeated is that a data scientist spends a lot of time running their model or analysis, and when it’s time to deliver the presentation, they just read the nice graphs and data visualizations they have. Literally. Don’t get me wrong, explaining your figures is super important because stakeholders aren’t usually data or data visualization savvy (especially with the more technical stuff). But you shouldn’t stop there. Learn the art of storytelling.

Some general guidelines on how to develop a storytelling skill:

  • Think about the so what from the outset.

Whenever you decide to start a new project, always solve the problem backwards: how can the decision-maker use the results of my analysis or model? What are the levers that they have? Is it even actionable? Never start without the answers to these questions.

  • Write it down.

Once you have figured out the so what, it’s a great practice to write it down. Don’t let it play a secondary role by focusing only on the technical stuff. Many times, you are so deeply immersed into the technical nitty-gritty that you get lost. If you write it down, the so what will act as your North Star in times of despair.

  • Understand the levers.

The so what is all about what can be actioned. The KPIs you care about are generally not directly actionable, so you or someone at the organization needs to pull some levers to try to impact these metrics (e.g., pricing, marketing campaigns, sales incentives, and so on). It’s critical that you think hard about the set of possible actions. Also, feel free to think out of the box.

  • Think about your audience.

Do they care about the fancy deep neural network you used in your prediction model, or do they care about how they can use your model to improve their metrics? The guess is the latter: you will be successful if you help them be successful.

Now What: Be a Go-Getter

As mentioned, data scientists are usually not the decision-makers. There’s a symbiotic relationship between data scientists and their stakeholders: you need them to put your recommendations into practice, and they need you to improve the business.

The best data scientists are go-getters who own the project end to end: they ensure that every team plays its part. They develop the necessary stakeholder management and other so-called soft skills to ensure that this happens.

Unfortunately, many data scientists lie on the other side of the spectrum. They think their job starts and ends with the technical part. They have internalized the functional specialization that should be avoided.

Measuring Value

Your aim is to create measurable value. How do you do that? Here’s one trick that applies more generally.

A data scientist does X to impact a metric M with the hope it will improve on the current baseline. You can think of M as a function of X:

Impact of X = M(X) − M(baseline)

Let’s put this principle into practice with a churn prediction model:

X ~ Churn prediction model

M ~ Churn rate, i.e., the percentage of active users in period t − 1 that are inactive in period t

Baseline ~ Segmentation strategy

Notice that M is not a function of X! The churn rate is the same with or without a prediction model. The metric only changes if you do something with the output of the model. Do you see how value is derived from actions and not from data or a model?

So, let’s adjust the principle to make it absolutely clear that actions (A) affect the metric:

Impact of X = M(A(X)) - M(A(baseline))

What levers are at your disposal? In a typical scenario, you launch a retention campaign targeting only those users with a high probability of becoming inactive the next month. For instance, you can give a discount or launch a communication campaign.

Let’s also apply the what, so what, and now what framework:

The what:

How is churn measured at your company? Is this the best way to do it? What is the team that owns the metric doing to reduce it (the baseline)? Why are the users becoming inactive? What drives churn? What is the impact on the profit and loss?

So what:

How will the probability score be used? Can you help them find alternative levers to be tested? Are price discounts available? What about a loyalty program?

Now what:

What do you need from anyone at the company involved in the decision-making and operational process? Do you need approval from Legal or Finance? Is Product okay with the proposed change? When is the campaign going live? Is Marketing ready to launch it?

Let's highlight the importance of the so what and now what parts.

You can have a great ML model that is predictive and hopefully interpretable. But if the actions taken by the actual decision-makers don’t impact the metric, the value of your team will be zero (so what). In a proactive approach, you actually help them come out with alternatives (this is the importance of the what and becoming experts on the problem).

But you need to ensure this (now what).

Using the notation above, you must own M(A(X)), not only X.

Once you quantify the incrementality of your model, it’s time to translate this to value. Some teams are happy to state that churn decreased by some amount and stop there. But even in these cases it useful to come up with a figure. It’s easier to get more resources for the team if you can show how much incremental value you’ve brought to the company.

In the example this can be done in several ways. The simplest one is to be literal about the value.

Let’s say that the monthly average revenue per user is R and that the company has base of active users B:

Cost of Churn(A,X) = B × Churn(A(X)) × R

If you have 100 users, each one bringing KES 1,000 per month, and a monthly churn rate of 10% churn, the company loses KES 10,000 per month.

The incremental monetary value is the difference in the costs with and without the model. After factoring out common terms, we get:

ΔCost of Churn(A,baseline,X) = B × ΔChurn(A;X,baseline) × R

If the previously used segmentation strategy saved KES 10,000 per month, and the now laser focused ML model creates KES 13,000 in savings, the incremental value for the organization is KES 3,000.

A more sophisticated approach would also include other value-generating changes, for instance, the cost of false positives and false negatives:

False positive

It’s common to target users with costly levers, but some of them were never going to churn anyway. You can measure the cost of these levers. For instance, if you give 100 users a 10% discount on the price P, but of these only 95 were actually going to churn, you are giving away 5 × 0 . 1× P in false positives.

False negative

The opportunity cost from having bad predictions is the revenue from those users that end up churning but were not detected by the baseline method. The cost from these can be calculated with the equations we covered above.

Summary

Companies exist to create value. Hence, DS teams ought to create value.

A data science team that doesn’t create value is a luxury for a company. The DS hype bought you some leeway, but to survive you need to ensure that the business case for DS is positive for the company. Value is created by making decisions. DS value comes from improving the company’s decision-making capabilities through the data-driven, evidence-based toolkit that you know and love.

A data science team that doesn’t create value is a luxury for a company. The DS hype bought you some leeway, but to survive you need to ensure that the business case for DS is positive for the company.

The gist of value creation is the so what.

Stop at the outset if your model or analysis can’t create actionable insights. Think hard about the levers and become an expert on your business.

Work on your soft skills.

Once you have your model or analysis and have made actionable recommendations, it’s time to ensure the end-to-end delivery. Stakeholder management is key, but so is being likeable. If you know your business inside out, don’t be shy about your recommendations.

--

--

Francis Gichere
Coinmonks

I hold a BSc in Statistics & Computer Science & currently pursuing an MSc in Data Science. LinkedIn: https://www.linkedin.com/in/gichere/