Source: Google Images

Business and Data Understanding in Data Science Lifecycle

“Give me data and I can do wonders”… A common fallacy we see today among data scientists. Many data scientist today miss an important step of understanding the business and how underneath data was generated from the business process before playing around with data

While jumping on to data is not bad all time but in most of cases we end up with insights that do not get integrated with real-world instances. There are multiple pitfalls of not having an understanding of business and underneath data, key ones are

  • Trying to solve business objective other than one that can impact the business significantly
  • Business and Data Scientist acting in a complete vacuum resulting in business deployment skew
  • Insights that does not generate any significant value even after it gets into production

To state an example, I know a marketing team who had a problem with low conversion rates (< 14 %) for acquiring new prospects. They were also spending a high cost to acquire prospects that were taking more time for them to turn positive post prospects turned on to be customers. Data Scientist team got internal CIO funding and related datasets to solve the problem. Business stakeholders were consulted minimally to clarify data elements rather than understand the business process or real challenges within the business environment

The resultant model that was developed did slightly better than the current conversion rate in the validation phase. Excited CIO and Data Scientist team put this finding forward to Business explaining what they did. Turned out marketing team challenge was not low conversion rate, while they would love to increase conversation rate their immediate problem was not able to prioritize prospects to understand where to spend more and where not. There was lack of customization to individual prospects in terms of products or offers. Finally they lacked research capability on which channel to prioritize and for what segments to increase spending resulting in better conversion rate

In some industry prospect conversion rates are in lower percentage but business does want wider audience to target than being too narrow on high conversion prospects. This way they are able to reach larger prospects to increase conversion coverage and acquire new segment of customers for business not only look alike of customer base today

Coming to the point

By not working on clear business objective or agreeable success criteria, data science project has already failed before it even started

If you take any data science lifecycle process CRISP-DM or TDSP, business and data understanding are starting point even before we get to work on underlying data for insight

Let us quickly see the activities we perform in business and data understanding phase

Business Understanding

In business understand phase we basically

  • Understands the business process
  • Define and Frame the business problem
  • Define the business objective
  • Agree on success criteria

To understand this phase in detail you can look into my video below. In this video I am taking an real world use case (Credit Underwriting) to walk you through this phase

You can also subscribe to my YouTube channel AIEngineering (AIEngineering) to get alerts as I post new videos on this or other topics

Data Understanding

In data understanding phase one typically

  • Understand data touch points in the context of business process
  • Gather knowledge on where data originates from, how it gets processed, what decisions are being made, where it is getting stored and how it flows to downstream
  • Deep dive into business meaning of the data being leveraged as well as knowledge present in existing system in form of rules
  • Check if it will be appropriate to use additional industry known external data sources that can enhance decision boundary
  • Check for target label availability as well as check for late arriving labels

You can check my video below on Data Understanding phase. I will walk you through the same example used in Business Understanding phase overlaying data touch points in the process

You can also subscribe to my YouTube channel AIEngineering (AIEngineering) to get alerts as I post new videos on this or other topics

    Srivatsan Srinivasan

    Written by

    Data Scientist | Data Engineer

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade