Data Science: Business understanding and defining a problem

Sourabh Potnis
4 min readFeb 17, 2022

--

Photo by JESHOOTS.COM on Unsplash

In our previous post of the series, we discussed the end-to-end Data Science workflow. In this post, we will discuss how to define a business problem and convert it into a Data Science problem. For each of the Data Science problem you are solving, use following following problem solving framework of (Context, Actors, Problem, Data, Hypothesis tree, Feedback, Solution)

Create a solution document by defining, brainstorming, documentating and revising each aspect. This documenatation is important facet of problem solving as it helps you to track, streamline and push your thinking.

Context — Describe the business unit/client, the work/process they do at uber level. Understand and add detailed information about the process we are trying to improve/automate/optimize.

Actors — Identify all the actors and define their roles that are involved in the problem solving and solution usage workflow. This includes end users and their needs, domain experts, technology/IT teams and Management(both business and IT). You can define multiple problems from the perspective of each stakeholder/actor first, and then come up with the final problem statement that satisfies all/most. e.g. an analyst will have a problem of doing some taks manually while Manager feels issue is with the data quality. For this you need to setup workshops, research and interview each of the stakeholder. Using the expertise of the end users and business/domain experts is important to define and solve the problem as you will never know all the topics in detail.

There will be multiple probelms that you will think is faced by different stakeholders and each of them having a different solution. In that case, you will have to prioritise the problems by plotting each of the problem on a graph as feasibility of solution vs. value from the solution.

Feasibility of the solution can be defined in terms of data & infra availibility, ease of implemantation, audit, regulations while value can be defined in terms of number of problems/users it impacts, user actionability, $/time/labour saving.

Then we need to prioritize problems that are simple to implement and have high impact and avoid complex low impact problems. This helps to deliver achievable solutions that client can use.

Problem —Business problem should be defined in such a way that it descibes why you are trying to solve the problem. Describe the business problem using two steps:

Current state — Define the need/challange/pain points that business is facing today.

Desired future sate — This may not be accurate at the start but should give idea about solution based on inputs from end users and root hypothesis.

  • Define how the solution will look like once the problem is solved?
  • Define the success criteria for the solution with quantifiable parameters like accuracy, $ gain/ $ saving, efficiancy gains in days, etc.
  • Define the scope of the solution like if global or for a particular set of users/regions/applications.
  • How the solution will be used by end users (new dashboard/integration to existing application/data in email)?
  • What advantages users will get after using the solution? It can be in terms of: Operational/productive/maintenance efficiency OR Dollar amount gains and/or savings OR Automation OR Optimization in terms of labor, capital, and technology OR Developing new capabilities through research.

Scope — Identify and define the problem scope.

You need to identify the scope for which you will be solving the problem i.e. if at gloabal scope or in a limited scope that can be at a perticular region level, sub-business unit level, etc. You can also start in limited scope and once baselined and accepted by end users, then move to global scope.

Data — Identify the data needs.

  • Data sources (internal/external) and its purpose
  • where the data resides(files/data mart/data warehouse/Data lake)
  • who(team/application) owns it
  • Feasibility and accessibility of data w.r.t. compliance, availability, regulations, security. etc.

Hypothesis tree

Defining a root hypothesis (solution at its highest form) is the most important step of problem solving. Then you can divide the root hypothesis into sub hypothesis and redivide them further to create a hypothesis tree.

Each leaf of the tree i.e. a hypothesis is then need to be analysed and rejected/not rejected i.e. should be actionable. Each hypothesis need to be mutually exclusive and collectively exaustive.

Feedback

Once all of the above points are defined, present it to the client/end users. Then take user feedack, integrate it and finalize the problem and the solution. Integrating feedback is important because it will make sure we will deliver the business value that the end user wants rather than just what we as Data scientists want. As a data scientist you always have the freedom to redefine the problem but for that you need to understand and quantify the value you will get by solving a single problem or multiple interrelated problems.

Solution —

Once business problem is defined and finalized, next step is to convert it into a Machine Learning problem(s) and describe the solution(s).

There can be multiple problems identified and there can be multiple solutions to each of the probelm. You need to choose a high impact problem with a feasible solution.

There can be one or multiple solutions to each of the hypothesis. A solution can be a descriptive / predictive / prescriptive in nature using supervised / unsupervised / reinforcement based learning.

So to summarise, data science problem need to be defined across following axes:

  • Business problem — Current state, desired future state, scope, success criteria
  • User problem — User challanges, pain points at personal level, usage and advantage of using the ML solution
  • Machine Learning problem — Type of ML problem? What are the features and output?
  • Software problem — What inputs user/system will provide and what output model will provide?
  • Data problem — What data needs to be collected, understood and analyzed?

In next chapter, we will discuss the data understanding and exploration.

--

--