Part-1 Data Science Methodology- From Problem to Approach

Ashish Patel
ML Research Lab
Published in
6 min readAug 9, 2019

From Problem to Approach…!!!

Sources : Coursera.org

Data Science methodology I have described basic with the all important question like which question you have to ask on which stage if you haven’t read that article and already read here I have explained another module wise process of the same course.

Article Series:

  1. Overview of Data Science Methodology
  2. Part-1 Data Science Methodology- From Problem to Approach
  3. Part-2 Data Science Methodology From Requirement to Collection
  4. Part-3 Data Science Methodology From Understanding to Preparation
  5. Part-4 Data Science Methodology From Modelling to Evaluation
  6. Part-5 Data Science Methodology From Deployment to Feedback

Did that happen to you? Your boss invited you to a meeting in which you were informed about an important task that you should absolutely respect within a very short period of time. They both come and go to make sure all aspects of the task have been taken into account and that the meeting ends with both assurances that things are on the right track. Later in the afternoon, after spending some time investigating the various issues, he realizes that he needs to ask several more questions to truly accomplish his task.

Unfortunately, the boss is not available until tomorrow morning. Now, with the tight deadline in his ears, he begins to feel a sense of excitement. So what are you doing? Do you take the risk or stop to ask for clarification?

Outline of the Article :

#1) Business Understanding

The methodology of data science begins with the search for clarifications in order to achieve what can be called business understanding. This understanding is at the beginning of the methodology because you can determine which data to answer the central question by clarifying the problem to be solved.

  • Too often, much effort is spent on answering the question people worry about. Although the methods of solving this question may be useful, they do not solve the problem. Setting a clearly defined question begins with understanding the purpose of the person asking the question.
  • For example, if a business owner asks, “How can we lower the cost of an activity?” We need to understand if the goal is to improve the efficiency of the activity. Or should the profitability of companies be increased? Once the goal is clear, the next piece of the puzzle determines the goals that support it. The breakdown of objectives can lead to structured discussions that set priorities that can help to organize and plan how to deal with the problem.

Depending on the problem, different stakeholders should participate in the discussion to identify the requirements and clarify the problems.

Case Study :

Let’s take a look at the case study on the application of business understanding. The case study asks the following questions:

  • How to best divide the limited health budget into optimal use to provide quality care? … This issue has become a hot topic for an American insurer.

As public funds for readmission declined, the insurance company ran the risk of offsetting the difference in costs, which could lead to higher costs for its clients.

  • Knowing that higher insurance rates would not be popular, the insurance company contacted local health authorities and hired Data Science Expert to learn how data science could be applied. the question. Before we could start collecting data, we had to define the objectives. After spending time setting goals, the team prioritized “patient readmission” as an effective area for review.
  • Taking into account the objectives, it was found that approximately 30% of those who completed the rehabilitation treatment would be reintegrated into a rehabilitation center within one year. and that 50% would be resumed within five years. After reviewing some records, it was found that patients with heart failure were high on the list of readmission.
  • It has also been found that a decision tree model can be applied to investigate this scenario to determine the reason for this phenomenon. To gain the business insight that will assist the analysis team in formulating and implementing their first project, Data scientists proposed and organized a workshop on-site.
  • The involvement of key commercial sponsors throughout the project has been essential as a sponsor: setting the overall direction. He remained committed and advised. If necessary, he got the necessary support. Finally, four business requirements were identified for each model built.

Namely: predict readmission results for patients with heart failure, predict the risk of readmission. Understand the combination of events that led to the expected result. Apply a process that is easy for new patients to understand because of their risk of readmission.

#2) Analytical Approach

  • Choosing the Right analytical approach depends on the question asked. The approach is to ask the person asking the question to clarify the most appropriate form or approach. here we can understand second stage of data science methodology. Once the problem to be addressed is defined, the appropriate analytical approach is selected in the context of the needs of the enterprise. This is the second step in the methodology of data science.
  • Once a Deep understanding of the question is established, the analytical approach can be selected. This means identifying what type of pattern is needed to address the problem more effectively.
  • When it comes to determining the probabilities of an action, a predictive model can be used.
  • When it comes to identifying relationships, a descriptive approach may be necessary. This would be one that analyzes similar activity groups based on events and preferences.
  • Statistical analysis refers to problems that require accounts. For example, if the question requires a yes / no answer, a classification approach to predicting a response is appropriate. Machine learning is a field of study in which computers can learn without being explicitly programmed. Machine learning can be used to identify relationships and trends in data that would otherwise be inaccessible or identified.

Case Study :

  • In the case where the question about human behavior is asked, it would be an appropriate response to use clustering approaches. Let us now examine the case study on the application of the analytical approach. For the case study, a decision tree classification model was used to identify the combination of conditions that resulted in the results of each patient.
  • In this approach, examining the variables in each of the nodes along each path of a leaf resulted in a corresponding threshold. This means that the decision tree classifier returns both the expected result and the probability of that result, based on the proportion of the dominant result, yes or no, in each group. From this information, analysts can derive the risk of readmission or the probability of a yes for each patient.
  • If the dominant result is yes, the risk is simply the proportion of patients with yes on the sheet. Otherwise, the risk is 1 minus the proportion of a patient on the leaf. A decision tree classification model is easy to understand and apply to non-data scientists to assess the risk of readmitting new patients.
  • Doctors can easily identify under which conditions a patient is considered to be at risk, and during hospitalization, multiple models can be designed and used at different times.
  • This provides a moving picture of the patient’s risk and its evolution in the various treatments used. For these reasons, the decision tree classification approach was chosen to create the cardiac failure readmission model.

Thanks for Reading…!!! Happy Learning…!!!

References :

  1. https://www.coursera.org/learn/data-science-methodology

--

--

Ashish Patel
ML Research Lab

LLM Expert | Data Scientist | Kaggle Kernel Master | Deep learning Researcher