Analyzing 2017 World Happiness Data

Hello! This post will be discussing my analysis on The World Happiness Report from 2017 using Jupyter Notebooks.

The World Happiness Report is a report collected by the United Nations Sustainable Development Solutions Network. The report uses data from the Gallop World Poll and includes rankings of national happiness based on respondent ratings of their own lives, which correlates with various life factors. The Happiness Score is primarily driven by these six factors: Economy (GDP per Capita), Family, Health (Life Expectancy), Freedom, Trust (Government Corruption), and Generosity. Each of these factors contributes to increasing happiness score. According to the Gallup Group, the rankings are based on answers to the life evaluation questions asked in the poll. This is called the Cantril Ladder. The Cantril Ladder asks respondents to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale (for more information on the Gallup World Poll methodology, see here). The confidence intervals of this data are within a range of 95% likelihood of the population means located.

I was interested in delving deeper into the Happiness Report to further understand contributing factors to the happiness ranking/ score. I previously was not aware of how these rankings were made. My main goals of this module were to better understand what some of the main factors that led to happiness score/rank were and different techniques for using python in data analysis.

Findings / Process

My first steps were importing the CSV file, and creating a data frame to begin analyzing the data. Using a data frame is useful because it structures the data in a way that allows one to use different methods such as .shape( ), .head( ), .describe( ) to analyze the data in an organized and visually pleasing way.

I used df_2017.corr( ) to get the correlation values between the data inputs (see table below). The Pearson correlation coefficient can take a range of values from +1 to -1. A value of 0 indicates that there is no association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one variable increases, so does the value of the other variable.

Correlation Values

I used the matplot library to create visualizations between the correlation values to further understand the data. I used https://seaborn.pydata.org/generated/seaborn.heatmap.html for references on the different parameters that could be used and to understand how to make a correlation heatmap.

Correlation Map (darkest and lightest are most correlated)

The correlation map above visualizes the correlation values between happiness scores and the factors that contribute to happiness score. It shows that there is a direct positive correlation between the Happiness Score of a country and economy, family, and health/ life expectancy.

Scatter Plots are a strong method in statistics because they can show the extent of correlation, if any, between the values of observed quantities. Below includes three scatter plots to better understand the correlation between happiness score and economy, family, and health/ life expectancy, along with the code used to create these visualizations.

Happiness Correlation Scatter Plot Code
Happiness Correlation Values Scatter Plot

The Happiness Score (Economy-Health-Family) Graph seen below is a series of subplots that shows the direct positive correlation which is also seen with the scatter plots. The code used to create these visualizations is also included in the image below.

Happiness Score (Economy-Health-Family) Code
Happiness Correlation Values

I also wanted to look into the relationship between happiness scores of countries and government/ corruption, as well as freedom. To see how strong the correlations were between happiness and these other political factors, I decided also visualize those correlation values.

Happiness Score (Gov Related Factors Code)
Happiness Correlation Values (Political Factors)

These visualizations allowed me to further understand the factors that relate to happiness scores brought me to conclude that economy, health/ life expectancy, and family directly impact happiness scores.

The next part of this analysis included sorting through the data to see the countries ranked by happiness. Below, the countries are listed by their happiness rank/ score. Norway had a #1 happiness rank, followed by Denmark, Iceland, and Switzerland in a tightly packed bunch. The Central African Republic was last in the report, ranking #154.

Countries Ranked by Happiness

In order to interpret the data returned by this table and find trends, I used a map visualization with a color scale to show global happiness. This visualization allows the user to hover over the country to get the happiness score (https://plot.ly/python/choropleth-maps/ was a great resource in making the map shown below).

Global Happiness Heatmap Code
Global Happiness Ranking 2017

Conclusions

In conclusion, it was very interesting to analyze the World Happiness Report of 2017 and look into the factors that contributed to these rankings using python. I found that the top countries that have high ranks on the happiness scale all rank highly on all of the main factors that are found to support happiness: family, honesty, health, income, caring, freedom, and good governance.

In the future, I would like to continue this project by looking into the differences in social, political, economic, and personal factors contributing to the happiness ranks of different countries. I also would like to compare the World Happiness Reports over time and analyze the differences throughout the years. For instance, it would be interesting to look further into a specific country such as the United States and do a deeper analysis of the factors that contribute to the change in happiness rank over time. In 2007, USA ranked 3rd among the OECD countries. This analysis showed that in 2017, it was ranked as 13th. According to the Gallup Group, the reasons are declining social support and increased corruption (click here for more), and it is these same factors that explain why the Nordic countries do so much better.

I found that this module allowed me to really spend time focusing on using different python libraries to analyze a very interesting data set. I have not spent much time using matplot so I found it extremely helpful to get the change to explore this library further and see how powerful it is. Going forward, I will be excited to continue using python for data analysis/ visualizations and am looking forward to learning more.

--

--