Business Problem Solving with Data Science — Scope your Solution — Post 2 of 3

12 min readJan 22, 2020

This is a series of blogpost on Business Problem Solving with Data Science — continued from here — https://medium.com/@shwedoshi/business-problem-solving-with-data-science-a76cf3f0fc7

Identify the milestones

Let us revisit the case study in terms of the problem we just framed in the
post

In consultation with the management of YumEats!, we had identified the
business problem to be solved —

Identify good restaurants to be onboarded for the app.

The stakeholders for this business problem would be the Director of Sales, B2B. The impact would be that more active restaurant partners with high ratings would improve the brand, drive more active users and increase daily orders. More restaurants onboard bring greater choice and more users to the app.

If famous yet not so visible restaurants are on the app, it would lead to
more value proposition to the users.

The business problem is now more or less clearly defined and framed. This is
the first step to go towards the data science problem.

The business stakeholders have helped you define what the business needs.

It’s now your job to translate that business need into analytic needs. That means there are a lot of questions that still need to be answered:

What analytic goals do you need to accomplish in order to justifiably claim
you’ve found a solution to the business problem?
What are options for reaching those goals, and which options are most
cost-effective (in terms of both time and resources)?
How will you be able to measure the extent to which your proposed
solution addresses the business problem?

In most cases, you cannot and should not wait for other people to tell you
what steps you need to take to solve a problem. Part of your job as a data scientist is to define the path to a solution, not just take the path others have
laid down. Do not think about methods and algorithms yet. Your task right now is to plan out what a viable solution will look like. Later, you will consider how to turn that plan into a reality. For the very first step, we would need to define the milestones.

The goal of “identify good restaurants to be onboarded” is clear enough from
a business perspective, but in terms of running an actual analysis we need to
further break it down into smaller milestones. It often helps to re-frame the business goal as a question. In the case of the scenario above, we might re-
frame “identify the right restaurants to onboard on the app” to “how can we identify which restaurants are the right restaurants to onboard on the app?”
That question is still largely unanswerable — it still too vague — but it paves the way for a few smaller questions, such as:

How do we define the various buckets of restaurants — excellent, decent,
bad?
How do we estimate the popularity of the restaurants?
What are the features that differentiate good and bad restaurants?

If you answer all of your milestone questions, you answer your large business
question as a matter of course. If you answer your large business question,
you’ve addressed your business problem. The three milestone questions are
not necessarily the “right” questions to solve the business problem. It’s less
about finding the right milestones and more about making sure you have
milestones.

For example, given the case study above, you have a reasonably clear analytic
goal: identify good restaurants that are likely to improve the brand.

Here are some guidelines for creating good analytic milestones for the project:

Eliminate possibilities. It’s easy to jump into using all available data to try
to solve a problem, but it is often wiser to think of important data
points — for the analysis. Further, you can bucket them as Critical, Good to
Have, etc to prioritize which features need to be collected first. This kind
of feature and data selection does not require any particular method — it
relies upon domain knowledge, which is something you can get from your
stakeholders. For example, a critical feature for the restaurant is location, year of establishment. Good to Have is the average number of daily visitors to
the restaurant.

Think about dependencies. If you can identify one thing that you think
you need to accomplish, ask yourself, “Is there anything I need to get done
before I can do this?” and “once I do this, what will I then be able to do?”
You don’t have to plan all milestones in order: identify just one and then
work backward and forwards from that point to identify the rest. For
example, first, we need to identify how we can identify a restaurant as
good or bad.

Next, we can think of the metric that can help satisfy the
objective.

Then, we can start thinking of features and how to collect data.

Group milestone activities by the entity. When an analysis involves multiple
entities, it often makes sense to create at least one milestone per entity,
and then at least one milestone to tie the entities together in a way that
solves the business problem.

For example, business stakeholders have already mentioned that they want good restaurants to be identified. So part of the analysis should include a comparative analysis of good and bad restaurants and find what separates both.

Defining milestones serve several purposes:

1. It helps you anticipate difficulties you confront as you develop your
analysis. If you can see these difficulties before they actually occur, you
can prepare for them or sometimes avoid them entirely.

2. It helps communicate your work to other stakeholders and provide
visibility. If a project has five milestones and two are completed and one is
set for completion next week, that helps stakeholders plan around the data
science work and therefore support it better.

3. It imposes order. In large projects especially, it is easy to get lost in all the
details of data cleaning and exploratory analysis. Relatively inexperienced
data scientists will often see their original timelines balloon as they
discover new aspects of the data which then require additional
investigations. By setting out milestones ahead of time, it is easier to stay
on track.

Now with the milestones identified, the next step is to build out a minimum
viable product.

Exercise — Milestones for the problem

By now, you must have identified the problem you wish to solve which helps in achieving customer retention on the travel management app. If you haven’t yet identified the problem, go back and formulate the problem you wish to solve using the steps in the previous chapter. Now with problem identified, drill down into various milestones of the one problem you have set out to solve on the app.

Go through the steps mentioned about eliminating possibilities and identifying dependencies. The output of this exercise is to identify milestones for your chosen problem.

Design minimum viable products

Look at this visual

https://blog.fastmonkeys.com/2014/06/18/minimum-viable-product-your-ultimate-guide-to-mvp-great-examples

To use the imagery from the above graphic, data scientists are often asked to
deliver cars. Inexperienced data scientists will then try to figure out how to
build the specific car they were asked for. Experienced data scientists will try
to figure out how to build a skateboard, and then figure out how to turn that
skateboard into a scooter, and then turn the scooter into a bicycle, and so on
until they finally have built a car. Even if they never build the car, they’ve still
delivered enough skateboards and bicycles and other means of helping their
customers do what they want to do.

Consider the following questions:

What is the smallest benefit stakeholders could get from the analysis and
still consider it valuable?
When do stakeholders need results by? Do they need all the results at
once, or do some results have a more pressing deadline than others?
What is the simplest way to meet a benchmark, regardless of whether you
consider it the “best” way?

A minimum viable product MVP allows you to provide value to your stakeholders in smaller increments, which makes them happy, and reduces the risk of having to throw away months of work because of misunderstood or
miscommunicated requirements, which makes you happy.

In our case, a minimum viable product could be just an analytic dashboard
showing a visualization of the restaurants already on the app. Group the good
and bad restaurants separately and visualize the features in comparison with
each other. Have a naive rule-based approach to identify good or bad
restaurants to set up the benchmarks. The typical journey of a data science
product is

1. Analytic solution — look at existing data to analyze patterns
2. Diagnostic solution — look at data to explain the past
3. Prescriptive solution — look at data to provide insights and decisions.

Instead of simply trying to order the analytic work, here are some ways you
could deliver some minimum viable products over the course of producing
your full results:

Plan in sprints. Set an arbitrary amount of time — typically 2 or 3 weeks -
and ask yourself: “what would I deliver if I had to deliver a solution by the end of that period?” Your answer to that question is probably a good
“skateboard” approach. For example: what is the most you could hope to
accomplish in two weeks? Maybe you feel it would be realistic to just
show the general trends in good and bad restaurants (analytic solution).

Think modularly. Once you have a general idea of something you want to
deliver, pause and ask yourself if there is a way to split that deliverable
into smaller deliverables that are useful all by themselves.

Get feedback. At every step of the product, get feedback from
stakeholders. For example: Make a simple dashboard out of your
restaurant analysis. Showcase the rule-based simple implementation (older
restaurants might be good) and check if the results are in the right form
(though might not be accurate).

Creating minimum viable products serves several purposes:

1. Data scientists usually find it easy to think about analytic details and
relatively difficult to think about value delivered to the business. Building
minimum viable products forces you, as the data scientist, to think more
about the considerations that are easier for you to overlook.

2. It helps interested stakeholders make a case for ongoing support of your
work. They will be more patient since you regularly show incremental
value.

3. Business needs change constantly. If you take six months to finish a
the project, chances are, half of the needs that motivated the project in the
first place will no longer exist, and the other half will have substantially
changed. By building incrementally, you minimize the chance that you work
will be outdated before it is even deployed.

Exercise — Identify the minimum viable product you will build

In the previous exercise, you have identified the milestones for the app. Now
from these milestones, come up with a minimum viable first deliverable that
will satisfy your stakeholder. In the example above, it was a simple dashboard
built upon the existing data of apps. On similar line, can you think of what the
minimum viable product would be? What would you build in 1 week and 1
month from now? What could be the skateboard version of the problem you
have identified for solving?

Identify target metrics

As we plan the roadmap for our data science project, one thing we must keep
in mind is how we will measure the success of the project. One obvious
success metric is the actual business outcome stakeholders want to achieve: if
they use your data science solution and onboard the suggested restaurants
higher DAUs in shorter time compared to the manual checks and onboarding.

Also think in terms of:

- Why should anyone trust the results of this analysis?
- What is the confidence on the prediction of the restaurants? Can they
go blindly with the suggestion or some other checks are needed?
- Where does the bulk of the value come from? Are there parts of the
analysis that are more valuable than others?
- Along with the suggested restaurants, can you solve other problems
like suggest dishes to be highlighted for new restaurants?

You may have extremely high confidence in the quality of your analysis, and
yet the results of the analysis might not be cost-effective for the business to
implement.

Coming back to the case study, we need to identify the target metric that we
would use to measure the success of the problem. The problem statement is
to identify good restaurants to be onboarded on the app. Now though a well-defined business problem, it is still subjective. For the right metric to be
applied a slight reframing of the problem is needed.
Currently, the ‘goodness’ of the restaurant is subjective, and to define the metric it must be objective. An objective measure of the goodness of a restaurant is the rating of the restaurant given by users. So the problem changes to identifying restaurants with a high rating. A metric must be measurable — so in terms of a data science problem it can be — predict the ratings of a restaurant to decide if they can be onboarded on the app.

Now let’s skip ahead in our thinking to consider what proof of value we want
or need to be able to deliver to the stakeholders in our case study. Here are
some guidelines for selecting good metrics:

Think explicitly about trade-offs. Almost any metric will involve a trade-
off. For example, in a classification problem, “precision” focuses on minimizing false positives, while “recall” focuses on minimizing false
negatives. False positives might be more important to the business than
false negatives, or the reverse could be true. For example: Out of a rating
of 5, consider ones with rating 4 and 5 as good restaurants and rest as
bad restaurants. Which is more harmful — identifying good restaurant as
bad, or identifying bad restaurant as good. The stakeholders are more
conscious of brand image and don’t want to onboard a bad restaurant.
Hence the metric to optimize could reduce the false positives or
‘precision’.

Figure out the business’s “value” units. Business stakeholders practically
never think about value in terms of root mean squared error or precision.
Maybe they think about customers served, or revenue generated, or hours
saved. Find out what unit of value your stakeholders think in, and estimate
the value of your analysis using that unit. For example stakeholders have
said that they want to get good restaurants, but upon further investigation,
you might find that what they really want is increased orders which in turn
impacts revenue and brand image.

Subset all metrics. An analysis should almost never have only one set of
metrics. All metrics used for the analysis as a whole should be repeated for
any relevant subsets: restaurant categories, cuisine, regions, etc. An
analysis may perform very well on average but abjectly fail for certain
subsets. That is relevant information that your stakeholders should have
when making decisions.

Keep it as explainable as possible. A good metrics does not always have
to be easy for non-technical stakeholders to understand, but non-technical
stakeholders do need to be able to understand whatever metrics you use.
If you choose a metric that is hard to explain, then you will need to make
the extra effort to help stakeholders understand it. If you can find an
easily-explainable metric that is still appropriate, you can focus your time
on other things. For example: assess the comfort level of stakeholders as it
regards technical metrics. Consider re-framing technical concepts such as
“false-positive rate”, and “false-negative rate” as “wrongly identified
restaurants” and “missed opportunities”.

Identifying target metrics serves several purposes:

1. It makes you clarify your and your stakeholders’ thinking about what value
the analysis is really meant to achieve.

2. It can keep you from pursuing interesting analytic questions that don’t
ultimately lead to value for the business. If a question won’t help you
produce one of your target metrics, it is probably out of scope of the
project.

3. It keeps you focused on explaining and justifying your work, which helps
those around you support you better. If other people understand what you
are doing and understand what value you are providing, they can help get
you the attention and resources you need to continue your work.

We can frame the data science problem as a regression problem — where we
predict the rating of the restaurant or a classification problem where we
bucket restaurants into good or bad based on the rating. For the first attempt
at solving the business problem, let’s go with the simple problem of
classification i.e. identifying if it is a good restaurant or a bad restaurant.

Exercise — Target metrics
If you have solved the exercises till now, great work! You are real close to a
properly defined data science problem that is distilled out of the identified
business problem. Now for the problem and the minimum viable product that
you have identified, figure out the target metric for the data science model.
Remember to think through if the target metric is in line with the business
metric you are setting out to achieve.

Also you will need to drill down if the problem is going to be a regression problem or a classification problem.

Business Problem Solving with Data Science — Scope your Solution — Post 2 of 3

Identify the milestones

Identify good restaurants to be onboarded for the app.

Design minimum viable products

Identify target metrics

Written by Shweta Doshi