Data Analysis Series C1 W1

Harshit Jain
Analytics Vidhya
Published in
3 min readJun 5, 2020

Course 1 : Week 1

Choosing research question, codebook and other basics

Here’s series of tutorials from scratch on how to do data analysis, we we’ll start from Descriptive Analysis & later will move to Predictive Analysis

Here are the steps on how to start working on a project from scratch, basically at first you need to decide either you’re going to produce/collect the data on your own or use existing data. Here I am going to use the existing data set

Terms to know
codebook: that describes how data is arranged in a computer file, and how is it measured(i.e. units, etc.)

Step 1: Choosing a dataset to work with
After reviewing given five codebooks, I chose GapMinder to work with, other datasets were too interesting as well, but I found GapMinder very fascinating to explore new insights.

Step 2: Identifying the topic of interest
I have seen people disturbed with alcohol consumption. I wonder if there is any co-relation between suicide rate and alcohol consumption, fortunately, the GapMinder dataset is something I can use to research further on it.

Step 3: Prepare a codebook of your own (a subset of your main codebook of the dataset(if any))

Step 3: Identify a second topic that you would like to explore in association with your original topic
As I have included “employrate” variable in my codebook as well, the reason is I would like to explore if alcohol consumption has any correlation with employment rate or not.

Step 4: Adding variables/Items/Questions documenting this second topic to your personal codebook
I have actually done that in Step 2, and Step 3 already. As I added “employrate”

Step 5: A literature review to see what researches have already been done on this topic. I searched from google scholar, though most were paid content, I went through the abstract of the most, and I found researches conducted by others were sounding my hypothesis as True Positive.

There were many pieces of research relating to my hypothesis, “An increase in per capita alcohol consumption of one liter is accompanied by a simultaneous increase in the male suicide rate of 1.9 percent.” stated in Abstract of “Alcohol and suicide — the Portuguese experiencehttps://onlinelibrary.wiley.com/doi/abs/10.1046/j.1360-0443.1995.90810534.x

Step 6: Based on your literature review, develop a hypothesis about what you believe the association might be between these topics. Be sure to integrate the specific variables you selected into the hypothesis.
those researches too direct to the point that, there might be a direct relationship between alcohol consumption and the suicide rate.

Primary Hypothesis:
The level of alcohol consumption of a country might be directly related to the suicide rate.

Secondary Hypothesis:
The level of alcohol consumption of a country might be directly related to the employment rate.

For continued week two series for this course refer here.

References :
[1] Alcohol and suicide — the Portuguese experience
https://doi.org/10.1046/j.1360-0443.1995.90810534.x

[2]Male suicides and alcohol consumption in the former USSR
https://doi.org/10.1111/j.1600-0447.1994.tb01520.x

[3]Suicides and alcohol consumption in Russia, 1965–1999
https://www.tandfonline.com/doi/abs/10.1080/09687630801931804

--

--