Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

EDA on World Happiness Index Report

--

Is your country among the happiest?

Photo by Hybrid on Unsplash

The World Happiness Report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. The World Happiness Report ranks cities around the world by their subjective well-being and digs more deeply into how the social, urban and natural environments combine to affect our happiness.

Data related to this is collected via the Gallup World Poll and the report is published yearly by the Sustainable Development Solutions Network. I have performed an Exploratory Data Analysis on the World Happiness Report datasets corresponding to the years 2015, 2017 and 2019. Each dataset has the following columns of which I have given a brief description.

  1. Happiness Rank- Rank of any country in a particular year.
  2. Country- Name of the country.
  3. Score- Happiness score as the sum of all numerical columns in the datasets.
  4. Trust- A quantification of the people’s perceived trust in their governments.
  5. Generosity- Numerical value estimated based on the perception of Generosity experienced by poll takers in their country.
  6. Social Support- Metric estimating satisfaction of people with their friends and family.
  7. Freedom- Perception of freedom quantified.
  8. Dystopia- Hypothetically the saddest country in the world.

Loading the datasets:

#loading the datasets y1,y2,y3 corresponding to 2017,2015 and 2019
y1_df=pd.read_csv('/kaggle/input/world-happiness/2017.csv',index_col=1)
y2_df=pd.read_csv('/kaggle/input/world-happiness/2015.csv',index_col=2)
y3_df=pd.read_csv('/kaggle/input/world-happiness/2019.csv',index_col=0)

Have a look at the 2015 dataset: (other datasets show a similar form)

Every column apart from Country and Region has a float datatype

After this I renamed the columns in all 3 datasets conveniently. The code can be viewed in my Kaggle Notebook. There is a bit to explain about the last column, ‘Dystopia Residual’, which is as follows:

Dystopia is a hypothetical country consisting of the least happy people. It was formed so as to create a benchmark to compare Happiness Scores of other countries with it. The Dystopia Residual is calculated as (Score of Dystopia+ Residual for the corresponding country). Here the Residual is a value generated for each country, which indicates if the 6 variables have under or over explained the life evaluations for each country for that particular year.

We now begin with the analysis:

1. Comparing Happiness Scores across different countries in the years 2015 and 2017, 2017 and 2019.

plt.figure(figsize=(10,6))
a=10
plt.hist(y2_df.Score,a,label=’2015',alpha=0.3,color=’red’)
plt.hist(y1_df.Score,a,label=’2017',alpha=0.5,color=’skyblue’)
plt.ylabel(‘No. of countries’,size=13)
plt.legend(loc=’upper right’)
plt.title(“Distribution of Happiness scores across 2015,2017”,size=16)
Have people been happier in 2017 as compared to 2015?

As observed from the stacked histogram, the extremum values in 2017 have slightly shifted to the left, we can conclude that living standards had hit a new low. Meanwhile, a significant increase in the number of countries having a score ranging from 5–6 was observed in 2017 as compared to 2015, which shows a lower number.

We can safely conclude that the moderately happy countries became happier in 2017 and the living standards at the extremums fell to some extent.

When were people relatively happier?

The stacked histogram shows that there has been an increase in the happiness score at the extremums as there appears to be a shift to the right in the year 2019. A significant increase in the number of countries with score between 4–5 has been observed. This may signify relatively better living standards and satisfaction of the people with their lives and the government in 2019.

We can conclude that people have been happier in 2019 as compared to 2017.

Therefore, standards of living have improved significantly in the last 5 years.

2. Correlating features with the Happiness Scores for the 3 years

Is there any significant linear relationship between Happiness Score and:

  1. Trust
  2. GDP_per_capita
  3. Freedom

Plotting correlation heatmaps between the 3 features mentioned above and the Happiness Score.

#correlation values for 2015 dataset
#creating a copy of the dataset with 4 columns.
y2=y2_df.copy()
y2.drop([‘Social support’,’life_expectancy’,’Generosity’,’dystopia_residual’],axis=1,inplace=True)
#creating a correlation matrix between numeric columns
c2=y2.corr(method=’pearson’)
plt.figure(figsize=(10,6))
sns.heatmap(c2,annot=True)
For 2015
For 2017
For 2019

1. How is GDP related to Happiness Score?

From the heatmaps, correlation coefficient between GDP_per_capita and Score is ~ 0.79 in all the 3 years indicating a strong positive relationship between the satisfaction of citizens with the GDP and the Happiness score. GDP is a tracker of the economic growth of a country. People in countries showing positive growth rate of economies tend to be more happier as it leads to higher incomes and better standards of living. Hence, we conclude that the GDP of a country should be one of the top priorities to ensure satisfaction.

2. Relation of Freedom to Happiness Score?

The correlation coefficient between Freedom and Score is exactly 0.57 in all 3 years indicating a moderately positive relationship between the perception of Freedom and the Happiness Score of a country. Generalizing this perception of freedom is a hard task because according to research it is different for people belonging to different parts of the world. Nevertheless, it is still a significant contributor to the Happiness Score of a nation.

3. How does perception of corruption affect the Happiness score?

Correlation coefficient between corr_perception(Trust) and Score is ~0.4 in all three years. This indicates a weak positive relationship between perception of corruption and the happiness score of a country.

Similar analysis of Generosity and Family metrics with Happiness Score is shown in my Kaggle Notebook.

3. Analyzing Life Expectancy

Life Expectancy is one of the key metrics for analyzing the health of the population of a country. In the case of humans, it is defined as the average amount of time a human being is expected to live.

plt.figure(figsize=(10,6))
a=10
plt.hist(y2_df.life_expectancy,a,label=’2015',alpha=0.3,color=’red’)
plt.hist(y1_df.life_expectancy,a,label=’2017',alpha=0.5,color=’skyblue’)
plt.ylabel(‘No. of countries’,size=13)
plt.legend(loc=’upper right’)
plt.title(“Satisfaction with Life expectancy across 2015,2017”,size=16)

Let us look at the country with the lowest Life Expectancy score.

#sorting values in 2017 dataset with ascending values of Life expectancy.
y1_df.sort_values(‘life_expectancy’,axis=0,ascending=True)
Lesotho has the least life expectancy score among all.

We observe that Lesotho has the lowest contribution to its Happiness Score in life expectancy with its score being 0. This is an indicator of the fact that life expectancy is very low in said country. The average life expectancy in Lesotho is 52 years. This statistic is one of the lowest observed in all the countries of the world.

What could be the reasons for this?

  • Uncontrolled Disease spread : Unavailability of proper health facilities and a doctor to patient ratio of 1:20000 has led to an increase in disease spread. About 20% of people in the country do not have access to clean water source, which consequently also leads to spread of diseases.
  • Unhygienic living conditions: Lesotho has a very high mortality rate. This is attributed to poor living conditions and the extreme poverty in which majority of the population of Lesotho lives. Inaccessibility to healthcare facilities renders these people helpless.
  • Inadequate source of healthy foods: Agricultural land covers an extremely small part of Lesotho. This creates food scarcity. Hence consumption of nutrients, vitamins is very low which contributes to the unfortunately low life expectancy of the country.
Photo by Melissa Askew on Unsplash

How is Lesotho being helped by the world?

Sentebale, a non-profit started by Prince Harry, has been providing care to children diagnosed with various diseases and take in orphaned children too. Financing the country would enable them to open more healthcare facilities, ensure a better and cleaner environment for the people, and open educational institutions.

The country has declared primary education free and this has led to a significant increase in enrollment. Fees of secondary schools are in talks to be lowered as a majority of the people cannot afford to pay it. More educated people would lead to better knowledge and awareness about the environment, diseases and would improve the overall spirit of the country.

Access to clean water is being provided by organizations like the Water Project which builds dams and wells for the people.

It is evident that not just Lesotho, but every underdeveloped country in the world needs help in order to improve the standards of living.

The country with the highest life expectancy score?

y1_df.sort_values(‘life_expectancy’,axis=0,ascending=False)

We observe here that Singapore has the highest life expectancy score. This indicates satisfaction of the people with the life expectancy of their country. Although it is not necessary that longevity leads to more happiness, a majority of the people wish to live longer. Here we look at the factors that increase the longevity of the people.

  • Proper healthcare facilities: The government has spent a lot of money in creating high quality healthcare facilities that provide unrestricted access to sanitation. Moreover the advancement in health related technology has benefitted the population.
  • Better living conditions: Maintaining hygiene, reduced pollution and access to nutritious foods, keeps the citizens well and healthy.

Last but not the least, the government has considered health issues a top priority of the country and has been contributing to research and development in the health sector. These are some reasons that has led to an increased longevity of the people.

4. Comparing Happiness Scores across regions

#creating a new series consisting of mean of happiness scores taken across different regions as specified.
#Converting this series into a dataframe
region=y2_df.groupby([‘Region’]).Score.mean()
region_df=pd.DataFrame(data=region)
reg=region_df.sort_values(by=’Score’,ascending=False,axis=0)
plt.figure(figsize=(10,7))
plt.title('Happiness Scores across different regions')
sns.barplot(x='Score',y=reg.index,data=reg,palette='mako')
“reg” dataframe as created in code snippet

The bar plot shows that countries in North America, Oceania and Western Europe have the highest average happiness scores whereas countries in Sub-Saharan Africa and Asia comprise of the 3 lowest.

5. GDP_per_Capita across different regions

#creating a new dataset from y2_df comprising of means of gdp_per_Capita score per region.
gdpc=y2_df.groupby([‘Region’])[‘Economy (GDP per Capita)’].mean()
gdpc_df=pd.DataFrame(data=gdpc)
gdp=gdpc_df.sort_values(by=’Economy (GDP per Capita)’,ascending=False,axis=0)
plt.figure(figsize=(10,7))
plt.title('Satisfaction with gdp in different regions')
sns.barplot(x='Economy (GDP per Capita)',y=gdp.index,data=gdp,palette='rocket')
“gdp” dataframe

Satisfaction with Economy of the country is observed to be the highest in North America whereas the lowest score is observed in Sub-Saharan Africa. To understand why such varying level of satisfaction is observed across different regions check out this article.

6. Visualizing Trust across different regions

trust=y2_df.groupby(['Region'])['Trust (Government Corruption)'].mean()
trust_df=pd.DataFrame(data=trust)
tru=trust_df.sort_values(by='Trust (Government Corruption)',ascending=False,axis=0)
“tru” dataframe

7. Visualizing Freedom across different regions

freedom=y2_df.groupby([‘Region’])[‘Freedom’].mean()
freedom_df=pd.DataFrame(data=freedom)
free=freedom_df.sort_values(by=’Freedom’,ascending=False,axis=0)
plt.figure(figsize=(10,7))
plt.title(‘Freedom score across different regions’)
sns.barplot(x=’Freedom’,y=free.index,data=free,palette=’rocket_r’)
“free” dataframe

Freedom can be thought of as how liberal are the citizens to exercise the fundamental rights given to them without breaking laws, whether or not they perceive any such oppression on themselves by the authorities.

We observe that Oceania has the highest freedom score and Eastern European countries have the lowest mean freedom scores.

Happiness is a very crucial quality in everyone’s lives. It gives hope to those who need it and ensures maintenance of harmony and love among the masses. It is not just the duty of the government but us as citizens too to contribute towards the Happiness factor of our country.

If these visualizations and explanations piqued your interests in these Happiness Reports, you can check out further plots along with the code I have written in my Kaggle notebook.

Datasets can be obtained from my github repository.

I hope you enjoyed this article!

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Nalin Rajput
Nalin Rajput

Written by Nalin Rajput

DS | Psychology | Football | Poetry

No responses yet