An Analysis of National Life Expectancy and GDP
In an effort to practice Data Science fundamentals and Data Visualization using Python, I have focused on life expectancy (LE) and GDP data from 6 nations over 16 years. The LE data was taken from the World Health Organization and the GDP data was taken from the World Bank. The nations I focused on in this analysis are Mexico, Chile, Zimbabwe, China, Germany, and the United States. I acquired LE and GDP data points for each nation for each year between and included 2000 and 2015 — the resulting DataFrame held 96 rows of data.
The goals of my project were as follows:
- Visualize change in GDP and Life Expectancy (LE) over time per country and for the whole data set
- Draw conclusions about year vs GDP and year vs LE for the sample of countries in our data
- Visualize and interpret relationship between GDP and LE per country
- Visualize a clear comparison of countries’ GDP over time, LE over time, relationship of GDP vs LE vs time
- Draw conclusions about the 6 countries’ growth or stagnation of GDP vs LE over time, the level of proportionality of GDP and LE over time
- Answer the following questions: — Is there a correlation between GDP and life expectancy of a country? — What is the average life expectancy in these nations? — What is the distribution of that life expectancy?
Before I share the results of my analysis and some data visualizations, I should provide more background regarding the data. The next two paragraphs will define the data.
The LE data is defined in depth on this WHO data page under the Metadata tab. From the page, the definition of this data is “The average number of years that a newborn could expect to live, if he or she were to pass through life exposed to the sex- and age-specific death rates prevailing at the time of his or her birth, for a specific year, in a given country, territory, or geographic area.” Life expectancy at birth reflects the overall mortality level of a population and is estimated mid-year, per country. There is one data point per country per year.
Similarly, the GDP data I acquired contains one data point per country per year. The source of the data can be found here along with detailed definitions under the “Details” tab. From the World Bank’s website: “GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates.”
I hope you’ve stuck with me! Below are my data visualizations. I’ve plotted LE over time, GDP over time, and GDP vs LE. In each of those pairings, I took two approaches. Firstly, I created figures with 6 subplots, one per country, each with independent Y axes to better view data trends. Secondly, I plotted the same data all on one shared axis to better compare total values.
Upon reviewing the plots, I made the following observations:
- LE has increased for all nations, especially Zimbabwe.
- GDP has increased for all 6 nations, with China having the most dramatic increase.
- GDP and LE were positively correlated for all countries. China and the US had roughly the same slope when plotting their GDP over LE.
- Average life expectancy was between 75 and 80 for all countries except Zimbabwe which was 50. The distribution of LE had a left skew for this reason, with most values landed on the right side of the spread.
Some further questions that came up are below:
- What drove Zimbabwe’s LE to be so low and what changed to allow it to recover so quickly? How are Zimbabwe’s dip in LE and its dip in GDP in the first half of the data related?
- Is there a reason that Chile and Mexico had similar dips in LE from 2008 to 2010 and then increases after 2010?
Feel free to check out the GitHub Repo here. Thanks for reading!