Testing a Potential Moderator

INTRODUCTION
The aim of this assignment was to run a correlation coefficient that includes a moderator. As in my previous assignment I tested the dependence between income per person and life expectancy. However I have split countries into two groups: those with high alcohol consumption and those with low. The outcome of my investigation can be seen below. For python script please scroll down to the end
INVESTIGATION
The countries were split by a particular criterion. Those that consumed less than or equal to 8 litres per person in a year were defined as “low alcohol consuming countries”. Those that consumed more were defined as “high alcohol consuming countries”, correspondingly. The first group contains 109 countries and the second contains 62. The R coefficient and p-value for the groups are following:
association between income per person and life expectancy for LOW alcohol consuming countries
(0.5085799406548519, 1.64151689069347e-08)
association between income per person and life expectancy for HIGH alcohol consuming countries
(0.6093640344213327, 1.4716677770817285e-07)
There is indeed a strong linear correlation between variables and the p-value is small enough to reject the null hypothesis.
GRAPHS

And the common graph to compare two groups:

To conclude, I would say that alcohol consumption does moderate in that case. From the common graph we can see that orange dots (those for high alcohol consuming countries) are located to the right in comparison to blue ones (those for low alcohol consuming countries). It means that people from countries from the second group need to have a higher income to live the same life in comparison to people from the first group. Another way the outcome could be commented is that people with the SAME income live less in high alcohol consuming countries than those from low alcohol consuming countries.

PYTHON SCRIPT

import pandas
import numpy
import scipy.stats
import seaborn
import matplotlib.pyplot as plt

data = pandas.read_csv(‘gapminder.csv’, low_memory=False)

data['alcconsumption’] = data['alcconsumption’].convert_objects(convert_numeric=True)
data['incomeperperson’] = data['incomeperperson’].convert_objects(convert_numeric=True)
data['lifeexpectancy’] = data['lifeexpectancy’].convert_objects(convert_numeric=True)
data['incomeperperson’]=data['incomeperperson’].replace(’ ’, numpy.nan)

data_clean=data.dropna()

print (scipy.stats.pearsonr(data_clean['incomeperperson’], data_clean['lifeexpectancy’]))

def alcohol (row):
 if row['alcconsumption’] <= 8:
 return 1
 elif row['alcconsumption’] > 8 :
 return 2

data_clean['alcohol’] = data_clean.apply (lambda row: alcohol (row),axis=1)

chk1 = data_clean['alcohol’].value_counts(sort=False, dropna=False)
print(chk1)

sub1=data_clean[(data_clean['alcohol’]== 1)]
sub2=data_clean[(data_clean['alcohol’]== 2)]

print ('association between income per person and life expectancy for LOW alcohol consuming countries’)
print (scipy.stats.pearsonr(sub1['incomeperperson’], sub1['lifeexpectancy’]))
print (’ ’)
print ('association between income per person and life expectancy for HIGH alcohol consuming countries’)
print (scipy.stats.pearsonr(sub2['incomeperperson’], sub2['lifeexpectancy’]))
print (’ ’)

scat1 = seaborn.regplot(x=“incomeperperson”, y=“lifeexpectancy”, data=sub1)
plt.xlabel('Income per person’)
plt.ylabel('Life Expectancy’)
plt.title('Scatterplot for the association between income per person and life expectancy for LOW alcohol consuming countries’)
print (scat1)
#%%
scat2 = seaborn.regplot(x=“incomeperperson”, y=“lifeexpectancy”, fit_reg=False, data=sub2)
plt.xlabel('Income per person’)
plt.ylabel('Life Expectancy’)
plt.title('Scatterplot for the association between income per person and life expectancy for HIGH alcohol consuming countries’)
print (scat2)