The latest dataset given to us was on happiness from World Happiness Report 2018. For a start, I did a simple descriptive analysis on the happiest 20% and unhappiest 20% of countries who participated in the report. My goal in this entry is to try a different visualization from the commonly used ones. In this entry, I compared the GDP and healthy life expectancy between the happiest and unhappiest countries.

Data Overview

Each row details information on one of the 156 countries that participated in the World Happiness Report Survey.

  • Rank: number indicating the order of countries from the happiest to the unhappiest. 1 being the happiest country up to 156 being the unhappiest country
  • Country: name of country that participated in this survey
  • Score: happiness score calculated based on the national average response to the question “Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?”
  • GDP per capita: GDP calculated with purchasing power parity (PPP) represented as a ratio
  • Healthy life expectancy: time series of healthy life expectancy at birth represented as a ratio
  • Other features: social support, freedom to make life choice, generosity, perceptions of corruption, dystopian residual were also available but not used in this analysis

Data Cleaning

  • Country: same as original data
  • Happiness category: labelled countries as ‘Happiest 20%’ to ‘Unhappiest 20%’ based on countries’ happiness rank.

Purpose: To identify countries that are the happiest and unhappiest. They make good data points for comparison.

  • Variable: Either labelled as GDP_per_capita or Healthy_life_expectancy

Purpose: Converted my data from a wide to long format for easier data visualization.

  • Ratio: Indicates the respective ratios for GDP_per_capita and Healthy_life_expectancy

Packages such as (1)dplyr (2)ggplot2 (3)lattice were used in R for data cleaning and visualization.

Visualizations

Bar charts are a popular choice when it comes to displaying continuous numerical variables. But are they the best choice when you have a larger number of x or y data points? Based on my experience, the bigger the number of observations to visualize, the more cluttered bar charts will look.

To reduce the look of cluttered-ness, I decided that I wanted to strip my visualization(viz) to its bare minimum. Following what Marie Kondo said “[to] only keep the things that spark joy” , what sparks joy for me is just showing the raw GDP and healthy life expectancy ratios. So the next best thing to just listing the numbers out(I mean that would simply be the dataset right?) is to represent the numbers on a scale and have each representation of a country take up the least possible space. So what’s smaller than just showing a numerical figure?

THATS RIGHT, A DOT.

The viz above is a dot plot detailing the GDP and healthy life expectancy ratios for 33 countries. In one plot, I am able to show 4 crucial piece of information(as seen in the Legend) for 33 countries without experiencing the clutter of a bar chart. Wowza!

Some may argue that having many dots could be hard to read so I decided to increase the plot’s readability by having dots that share colour undertones relate to either the happiest or unhappiest countries. The warm colours are associated with the happiest countries while the cool colours are associated with the unhappiest countries.

What we can conclude: The dots relating to the happiest countries are distinctly separated from the dots relating to the unhappiest countries. This shows that the happiest countries generally have a significantly higher GDP per capita and a higher healthy life expectancy as one would expect.

While recognising the importance of unraveling novel insights, data viz is equally important in helping others understand the magic you see. After all knowledge is only worth as much as what others can understand. For this viz, I decided to be ambitious and squeezed a lot of information into one graph which definitely has its strengths and drawbacks. One of the biggest plus for this viz is that you can see an overview of a large number of observations without as much clutter compared to other charts. However, one of the biggest weaknesses is the loss of data or the inaccuracy of data.

As some of you would have caught on , GDP and healthy life expectancy can’t possible share the same measurement scale since they are both calculated differently. I understand that my viz above could imply relativity between GDP and healthy life expectancy. Please note that GDP ratios are relative to each other but not to healthy life expectancy ratios and vice versa.

Knowing that there might be a possibility for readers to misunderstand, I still went ahead with this as I felt that showing ratios albeit for different features, retains most of the accuracy of information conveyed compared to absolute numbers.

If y’all have would like to suggest how I can improve this graph or discuss what other ways I can use to display multiple information in one plot please comment away!

Majoring in Business Analytics. Perhaps I can finally understand my dog’s quirky behaviour after I’ve collected enough data.