Scientific Method for Data Analysis

Prashant Sihag
Analytics Vidhya
Published in
3 min readSep 4, 2019

Data Analysis is a process of manipulating data in order to discover information which can be used in decision making. In this blog, I have discussed about a method which includes some steps which if performed in specific order can make data analysis process smooth and efficient. This method is called as “Scientific Method”. Lets understand what this method offers in context with data analysis.

Scientific Method for Data Analysis

Question

This method starts with a question. A question in context with the problem you have been given by your company. The question will be something which you will be working on for your remaining analysis process. The whole process will be based on this question. A question can be a “what”, “why”, “how”, “which”, “when”, “who” or a “where” sort of questions based on something you can measure, preferably with a number. For example, if your company provides you a data set along with a problem which you are expected to solve using data analysis, the problem is generally the question like “What is the relationship between two products” or “Which product is most liked by the customers”. The questions or problem you will be given in real will not be exactly like the ones in the example and will be more meaningful.

Background Research

As a follow up to the question, you should do a background research. You should check the literature, existing studies, existing results to find out whether any sorts of code, good established practices or examples of the analysis which has or has not worked in the past in order to start our analysis from a good point and avoid potential mistakes from happening and getting meaningful answers from our further analysis.

Construct a hypothesis

Once we get the question and has performed a thorough background research, we will translate the question, which would be a real life problem into a research hypothesis. A hypothesis is a proposed explanation for a phenomenon. According to Wikipedia, “Scientists generally base scientific hypothesis on previous observations that cannot satisfactorily be explained with the available scientific theories.” . In simpler terms, hypothesis is a theory that is provisionally accepted in order to interpret certain events or phenomenon. In statistical universe, its an educated statement about what you believe to be true, in terms of your original question. This hypothesis generally uses question based on real life consisting variables and quantities which we can measure in order to make specific statistical statements about them.

Test the hypothesis

Once we construct the hypothesis, we will test it. To test the hypothesis, we need data which is designed according to the nature of our experiment or question. Every hypothesis testing requires the analyst to provide a null hypothesis, which you constructed in previous step, and an alternate hypothesis, which will be accepted when null hypothesis is proved to be statistically wrong. Various statistical analysis steps are performed to test the hypothesis.

Analyze results and draw conclusion

After testing the hypothesis and getting results, the results need to be analyzed in order to draw conclusions and this is where statistical software typically comes into picture to weigh the evidence that we have collected to see if hypothesis that we have formulated in order to answer the question of interest is true or false. In here, we decide whether to go with null hypothesis or an alternate hypothesis.

Reporting results

After getting the conclusion on our question of interest, a report must be made consisting the results and describing what did we find, was our hypothesis correct and what are the practical implications of our results. The results should not only interpret what you found in context with your specific hypothesis or question but also broader conclusion to make connection with a real life situation of the problem at hand.

These are the basic steps which are followed in analysis of any data. Everyone has their own ways of doing analysis but steps explained above are always required to be performed in order to get the required conclusion. As technology keeps evolving and new software and tools are constantly released, some of the steps explained in this blog might get replaced by some software or tool but the process remains the same no matter how it is performed.

Thank you for reading. Enjoy statistics.

--

--

Prashant Sihag
Analytics Vidhya

Software Developer. Technology Enthusiast. Learning and explaining technologies in simpler ways.