Ask These 4 Questions Before Starting Any Data Science Project

4 questions you should ask before starting any data science project and why they are so important.

Duan Cleypaul
5 min readFeb 9, 2022

Starting things is hard. Just as you learn in physics about Inertia, it’s very hard for us (Data Scientists) to get a project from scratch, leave the initial “frozen” state, and actually know what the first steps are. In this article I’ll share with you four questions that I ask before starting any data science project in order to help me have a sense of how I should conduct my studies. I hope this comes to be as useful to you as it is to me.

A blackboard with 4 drawings of question marks that slowly become a lightbulb colored in yellow.
Source: https://i0.wp.com/leadingwithtrust.com/wp-content/uploads/2017/04/questions.jpeg?resize=640%2C410&ssl=1

In order to have more context, let’s consider the following scenario:

  1. You’re the only data scientist at a retail company that has a bunch of stores;
  2. The head of the Sales department tells you he/she wants to predict the sales for the next 6 weeks.

And before we go into the four questions, I want to share with you the first fundamental rule I try to follow as a data scientist:

NEVER START DOING THINGS BEFORE ASKING QUESTIONS

Was that big enough? No? Let me try this way…

An animated gif with the phrase “never start doing things before asking questions” in big colored letters.

Take note, write it down, tattoo it in your forearm if necessary, but don’t ever forget that. Asking questions is the “scientist” part of your job. Now let’s continue.

1. What is the motivation?

The words CONTEXT MATTERS written one underneath the other with white letters and dark grey and orange background.
Source: https://miro.medium.com/max/1400/1*qJ75Nislpn2RisuwY4hCbg@2x.jpeg

When someone gives you a challenge to solve try to understand what motivation is behind it, asking your team leader (or yourself) questions like:

  • What is the context / motivation behind this request?

Knowing the answer for that question is the first step to have a better understanding of how your efforts as a data scientist will benefit the company.

Let’s say the answer for that question is that there was a staff meeting a week ago and the CFO asked the Sales team to have more predictability of future sales. Now that you know that, it’s a bit more clear that this was a demand that came from above the Sales team and maybe there is a root cause behind it that might help you understand more about the challenge and its opportunities.

2. What is the root cause?

A question mark drawn as if it rose from the ground up.
Source: https://media.istockphoto.com/vectors/the-question-mark-vector-id934903676?k=20&m=934903676&s=170667a&w=0&h=e8_8i-UZOsdnC7k5QboevKkFCvyCAOO80I7LDjGPROA=

Ok, we know that there was an important meeting a few days ago, the CFO was there, and he/she requested a future sales prediction. But why?

I wrote in a previous article that people don’t usually tell you what the problem is, but the idea of what they think a solution would be for their problem. Because of that, it is a wise approach to ask the following questions:

  • What is the root cause?
  • Why does the CFO want to know the prediction of the next 6 weeks of sales?

Following our hypothetical scenario, the CFO answer is that he/she wants to expand the infrastructure of the stores, and the information about how much money the stores will make in the future will help them plan the necessary investments.

An animated gif of a person smacking their hands together and saying “Oh, I get it!”.
Source: https://c.tenor.com/1JcnBncB_IEAAAAM/alonzo-lerone-youtuber.gif

Very good! Things are starting to take form now. You now have a better understanding of the context and the root cause of the demand and how you can add value to the company with your data science skills.

But what if you have more questions ahead? Who should you direct your next questions to?

3. What are the Stakeholders?

A drawing of 6 peoples of orange skin color in a white background with different hair styles, wearing suits.
Source: https://3back.com/wp-content/uploads/2009/03/Stakeholder-Swamp-300x300.png

Every demand has one person or a group of people that most care about the solution to the problem. That person or group owns the demand and masters most of the subjects related to that problem (business wise). So a great question you should ask here is:

  • Who owns the problem?

That person or group will be the source of motivation, context, and opportunities to be discussed/explored during the development of the solution. Know their names, contact information, and availability. You’ll need their assistance eventually.

Now that you have context, root cause, and stakeholders listed, it’s time to minimally align expectations for the solution.

4. What is the format of the solution?

The word SOLUTION in white letters and yellow background with dark red and green drawings of light bulb, documents, notebooks, and other figures.
Source: https://eudi.eu/wp-content/uploads/2017/12/4.Solution-Suggestion.jpg

When someone asks you to generate a solution to their problem, there’s usually (if not always) an expectation behind. To make sure you will be able to meet those expectations, try to answer the following questions:

  • How granular should be the solution? (for our example, should the prediction be by store? by product? by region?)
  • What type of problem are you facing? (is it sales prediction? classification?)
  • What are the potential methods you could use to address the issue? (Neural networks? Deep Learning? Regression? Time Series?)
  • How should the solution be delivered? (API? Through email? As a report? In a dashboard?)
  • Is there a major question that needs to be answered?

The answer of these questions will not only help you understand better your challenges but will also give you relevant information to better plan your next steps (and actually solve the problem).

A gif of a person applauding and showing expression of amusement.
Source: https://i.pinimg.com/originals/48/af/d0/48afd0510b98ad1202daaee5bf28bc4c.gif

Wow! That’s my reaction when I see that I actually left the foggy part of the early stages of a data science project and that it only took me 4 main questions to do so.

If you think this article helped you in any way to make the initial phase of DS projects a bit lighter, I invite you to check this article to discover more tools for conducting a full data science project. As always, show me some support by clapping to this post, feel free to leave a comment, and stay tuned for more knowledge on Data Science.

--

--