Life Expectancy and GDP

Raghvendra Pratap Singh
The Startup
Published in
5 min readSep 22, 2020

This report visualizes the data of the life expectancy of the countries across the World. Also, it tries to establish a relationship between life expectancy and GDP per capita of the countries.

A comparative study has been conducted on the top ten and bottom ten countries of 2017 with their life expectancy in 1987. As there is a difference of 30 years, this analysis helped to understand the increase or decrease in the expected years of life in the countries taken into account.

Moreover, it is very obvious that life expectancy is affected by factors such as happiness, pollution, terrorism, diseases, and many more but I tried to compare the life expectancy with the GDP per capita of each country. As a result, countries with better GDP per capita provided better life expectancy to their citizens in comparison to the countries with a lower GDP per capita.

Data Sets:

Three data sets and two links which have been used in this report are taken from assorted sources. I have downloaded the CSV format files from the source. Merging of all these data sets was performed by using Jupyter notebook from Anaconda. One of the important aspects of big data, i.e. Variety is present due to the diverse sources of data and their integration to analyze and find the required result.

1. Our World in Data: life expectancy

Our World in Data is an online publication which helps us in understanding the changes in the living conditions across the World. This data set of life expectancy contains data from 1950 to 2017. There are 19206 rows and 4 columns (Country name, country code, Year and Life expectancy) in this data set.

Source: https://ourworldindata.org/life-expectancy

2. World Bank Data: GDP (per capita) by Country and County-wise Population

The World Bank is headquartered in United States and shares its knowledge for analysis by assorted agencies and individuals. I have collected data of GDP per capita from the following

source: https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD and the Population from the following source: https://data.worldbank.org/indicator/SP.POP.TOTL . There are 265 rows and 64 columns in each of the data sets of GDP per capita and Population, respectively.

3. Additional GDPs

Due to the missing values of GDPs of Andorra and Monaco, in spite of populating them with the mean/median/mode of GDP or removing them, I have collected their exact data from following sources: https://www.macrotrends.net/countries/MCO/monaco/gdp-per-capita and https://www.macrotrends.net/countries/AND/andorra/gdp-per-capita

Data processing, cleaning and integration:

First the data sets were downloaded and saved in CSV format. Then GDPs of Andorra and Monaco were updated manually in the CSV. Due to the variety in the sources of my data sets, the name of the countries were slightly different, like, South Korea was mentioned as Korea, Rep., in few places. So, such discrepancies were removed. After this, the CSVs were loaded in Jupyter notebook using read_csv function of pandas library in Python. Three data frames of life expectancy, population and gdp were created for the year 2017. And later, more data frames were created by removing unnecessary columns from the data frames of population and gdp. Later, they were merged with an inner join using the pandas merge function where the column containing the country name (entity) was used as the primary key. Column names were updated using rename function and row containing ‘World’ data was dropped using drop function. Data frame was sorted using sort_values function and top ten and bottom ten countries were selected by using head function and changing the ‘ascending’ parameter of sort_values. Further, similar procedure was followed to get a data frame for the year 1987. Country name (or entity), life expectancy, population and GDP were the columns of my focus.

Visualization:

First and Second Plots: As mentioned in the abstract, the first grouped bar chart compares the life expectancy of top ten countries of 2017 to their life time expectancy in 1987. The second grouped bar chart compares the life expectancy of bottom ten countries of 2017 to their life time expectancy in 1987. Bar function from matplotlib’s pyplot library was used to plot these charts. Grid function was used to show the exact values in best possible way. Color-codes used to plot these charts are: ‘c, ‘lightblue’, ‘darkorange’ and ‘moccasin’ by referring to the named colors link of matplotlib. To choose the colors, a blog from goodly was referred for using the bright color for the main bars and vice versa. As a convention, y-axis of this chart starts from 0. Title of the chart has the font size of 15 while the x and y labels have the font size of 13.5. It’s apparent from chart that in 1987, top 10 countries of 2017 had life expectancies in 70s with San Marino performing best among all, followed by Japan. Where as in 2017, all of them entered in 80s with Monaco performing best followed by San Marino. Also, if we talk about the bottom ten countries of 2017, in 1987, most of them had their life expectancies in 40s except Nigeria having in 50s and Chad touching 60s. And, without much surprise, most of them exceeded in 2017 except Chad which is surprising because it was the best performer of the group in 1987.

Third Plot: The third plot is a bubble chart which is plotted using the scatter function of pyplot library. This plot shows the relationship between the life expectancy of the countries with their respective GDP per capita. Variables used to plot this chart are as follows: GDP per capita (on x-axis), life expectancy (on y-axis), bubbles (representing the countries) and the population of countries (as size of bubbles). Title of the chart has the font size of 15 whereas the x and y labels have the font size of 13.5. Due to the large number of countries, colors were randomly assigned to each country by using rand function from numpy. The Population data was converted into millions and the GDP (per capita) was available in US dollars. Marker ‘o’ was used in scatted function to get the shape of bubbles. Grid function was used to show the exact values in best possible way. Albeit, we can’t deny the role of other factors along with the GDP, but, with the available data, countries with the higher GDP performed better in terms of life expectancy.

Conclusion:

To the best of my understanding, first and second grouped bar charts provides a thorough insight about the increment in the life time expectancy of the countries over the span of 30 years with Chad as only exception. Third chart successfully depicts a positive relationship between the GDP and life expectancy of each country, of course, with few outliers.

To get my ipython notebook, please follow this link of Kaggle!

I have shared my own experience in this article. Please share your thoughts if you find anything incorrect here.

Twitter: @MrTomarOfficial

LinkedIn: https://ie.linkedin.com/in/raghvendra-pratap-singh-tomar

--

--