Generating a Correlation Coefficient

I generated two correlation coefficients. I found out that there’s weak positive correlation between alcohol consumption and life expectancy. I also learned that there’s strong positive correlation between income per person and life expectancy.

For starters I drew two scatter plots

And ran the code below to generate the correlation coefficients:

# -*- coding: utf-8 -*-

import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt

data = pandas.read_csv(‘gapminder.csv’, low_memory=False)

#setting variables you will be working with to numeric
data[‘lifeexpectancy’] = data[‘lifeexpectancy’].convert_objects(convert_numeric=True)
data[‘alcconsumption’] = data[‘alcconsumption’].convert_objects(convert_numeric=True)
data[‘incomeperperson’] = data[‘incomeperperson’].convert_objects(convert_numeric=True)

data[‘incomeperperson’]=data[‘incomeperperson’].replace(’ ’, numpy.nan)

scat1 = seaborn.regplot(x=”alcconsumption”, y=“lifeexpectancy”, fit_reg=True, data=data)
plt.xlabel(‘Alco Consumption’)
plt.ylabel(‘Life Expectancy’)
plt.title(‘Scatterplot for the Association Between Alco Consumption and Life Expectancy’)

scat2 = seaborn.regplot(x=”incomeperperson”, y=“lifeexpectancy”, fit_reg=True, data=data)
plt.xlabel(‘Income per Person’)
plt.ylabel(‘Life Expectancy’)
plt.title(‘Scatterplot for the Association Between Income per Person and Life Expectancy’)


print (‘association between alcconsumption and lifeexpectancy’)
print (scipy.stats.pearsonr(data_clean[‘alcconsumption’], data_clean[‘lifeexpectancy’]))

print (‘association between incomeperperson and lifeexpectancy’)
print (scipy.stats.pearsonr(data_clean[‘incomeperperson’], data_clean[‘lifeexpectancy’]))

And that’s my results: