An Analysis of National Life Expectancy and GDP

Sean Searle
4 min readAug 2, 2023

--

In an effort to practice Data Science fundamentals and Data Visualization using Python, I have focused on life expectancy (LE) and GDP data from 6 nations over 16 years. The LE data was taken from the World Health Organization and the GDP data was taken from the World Bank. The nations I focused on in this analysis are Mexico, Chile, Zimbabwe, China, Germany, and the United States. I acquired LE and GDP data points for each nation for each year between and included 2000 and 2015 — the resulting DataFrame held 96 rows of data.

The goals of my project were as follows:

  • Visualize change in GDP and Life Expectancy (LE) over time per country and for the whole data set
  • Draw conclusions about year vs GDP and year vs LE for the sample of countries in our data
  • Visualize and interpret relationship between GDP and LE per country
  • Visualize a clear comparison of countries’ GDP over time, LE over time, relationship of GDP vs LE vs time
  • Draw conclusions about the 6 countries’ growth or stagnation of GDP vs LE over time, the level of proportionality of GDP and LE over time
  • Answer the following questions: — Is there a correlation between GDP and life expectancy of a country? — What is the average life expectancy in these nations? — What is the distribution of that life expectancy?

Before I share the results of my analysis and some data visualizations, I should provide more background regarding the data. The next two paragraphs will define the data.

The LE data is defined in depth on this WHO data page under the Metadata tab. From the page, the definition of this data is “The average number of years that a newborn could expect to live, if he or she were to pass through life exposed to the sex- and age-specific death rates prevailing at the time of his or her birth, for a specific year, in a given country, territory, or geographic area.” Life expectancy at birth reflects the overall mortality level of a population and is estimated mid-year, per country. There is one data point per country per year.

Similarly, the GDP data I acquired contains one data point per country per year. The source of the data can be found here along with detailed definitions under the “Details” tab. From the World Bank’s website: “GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates.”

I hope you’ve stuck with me! Below are my data visualizations. I’ve plotted LE over time, GDP over time, and GDP vs LE. In each of those pairings, I took two approaches. Firstly, I created figures with 6 subplots, one per country, each with independent Y axes to better view data trends. Secondly, I plotted the same data all on one shared axis to better compare total values.

With independent y axes, we can see the shape of the LE trends per country over time. We can see that all countries saw overall growth in LE. The U.S., Germany, and China saw very steady increases, while Mexico and Chile saw more variation year-to-year. Zimbabwe saw a dip from 2000 to 2004 and then a steady increase after that.
With a shared y axis, we can see the same trends as noted above. You can see that Mexico and Chile had less steady growth curves, however it is more difficult to see that trend when compared with the individual y axis plots.
Comparing plots with individual y axes, we can see that each country had GDP growth from 2000 to 2015. Zimbabwe saw a decrease until 2008 before a larger increase until 2015. Chile saw a dip from 2013 to 2015, but the dip is dwarfed by the growth in previous year periods. Mexico and Germany saw steady growth over the whole time frame, but had more variation between some year-to-year data points than the other countries. The U.S. and China saw the steadiest growth.
In the date range, China and the U.S. saw large growth in GDP whereas the other countries stayed much more level in the annual GDP figures in comparison. Since the y axis is shared, we can see that the growth in GDP was far more significant for China and the U.S.
Each country has an linear relationship between GDP and LE which indicates a positive correlation. China has a slightly exponential curve whereas Chile’s curve might be slightly logarithmic.
We can see that China and the U.S. have very similar slopes of GDP over LE. With a shared y axis, Zimbabwe’s curve looks flat as its LE goes up steadily, its increase in GDP cannot be seen on the plot because it is much smaller than the other nations’. Chile’s GDP growth can just be made out and Mexico and Germany’s growth can be easily seen, but at a much smaller scale than China and the U.S.

Upon reviewing the plots, I made the following observations:

  • LE has increased for all nations, especially Zimbabwe.
  • GDP has increased for all 6 nations, with China having the most dramatic increase.
  • GDP and LE were positively correlated for all countries. China and the US had roughly the same slope when plotting their GDP over LE.
  • Average life expectancy was between 75 and 80 for all countries except Zimbabwe which was 50. The distribution of LE had a left skew for this reason, with most values landed on the right side of the spread.

Some further questions that came up are below:

  • What drove Zimbabwe’s LE to be so low and what changed to allow it to recover so quickly? How are Zimbabwe’s dip in LE and its dip in GDP in the first half of the data related?
  • Is there a reason that Chile and Mexico had similar dips in LE from 2008 to 2010 and then increases after 2010?

Feel free to check out the GitHub Repo here. Thanks for reading!

--

--