Tracking the progression on income and life expectancy for the 16 SADC countries over the last 5 decades

Cav Bepura
10 min readAug 1, 2022

--

SADC member states

Introduction

This project, which is in partial fulfillment of ALX-T Data Analyst nanodegree program through Udacity, tracks the progress made by the 16 Southern African Development Community (SADC) countries across various metrics for the 5 decades between 1971 and 2020. The data used on this project is from Gapminder, and for the purposes of this project, the following indicators are tracked:

  • Income per person (GDP/capita, PPP$ inflation-adjusted): Gross domestic product per person adjusted for differences in purchasing power (in international dollars, fixed 2011 prices, PPPbased on 2011 ICP).
  • Life expectancy (years): The average number of years a newborn would live if current mortality patterns were to stay the same.
  • Life expectancy, male: Life expectancy at birth for males
  • Life expectancy, female: Life expectancy at birth for females

This report will present a high-level overview of the project, the steps I took in preparing the data, my observations, and conclusions. The entire project documentation, which includes the Jupyter notebook with the analysis as well as the raw data can be found on my GitHub repository.

Motivation for this project

This is a project that is close to my heart, as someone who is born and bred in Southern Africa. I have lived through our challenges as well as celebrated our successes over the course of my life. It is fascinating to be able to put some numbers to the joys and tribulations that we have lived through, as well as paint a few pictures that are ‘worth more than a thousand words’ through the visualizations that I produced for this project.

The guiding questions

In order to focus my analysis, I came up with a few questions that I will attempt to answer. Below is the list of questions:

Is there a relationship between the average GDP per capita income and the life expectancies (combined, male and female) for the SADC region?

Is there a relationship between the GDP per capita income and the life expectancies (combined, male and female) for each of the 16 countries in the SADC region?

Is the trend where female life expectancy is higher than the male life expectancy consistent for ALL 16 SADC countries for each of the years under review (1971 to 2020)?

How do the figures for the GDP per capita income and the life expectancies for each of the countries compare to the average figures for the SADC region over the review period?

The process that I followed

Firstly, I loaded the data from Gapminder into dataframes and did some preliminary checks to confirm that the data was loaded as well as the shape and size of the data.

The data was fairly clean with no missing values at all. Therefore, the only clean up that I did as my second step was replacing the ‘k’ which was used to indicate ‘thousands’ in the income data set with numeric thousands (000's). This was necessary so that I could manipulate the income data as numeric data for my analysis.

The last two steps, which were iterative as I gained more insights about the data, involved exploratory data analysis and the actual data analysis to answer the questions through visualizations and interpretation of the outcome to come up with conclusions.

As part of the conclusions, I also acknowledge some limitations that I faced due to the limited scope of intended outcomes of this project as well as areas of possible future study.

The next section looks at few of the highlights from the exploratory data analysis phase.

Highlights from the exploratory data analysis (EDA) observations

One of the interesting observations is on the trend of the average GDP per capita income which indicates a general consistent rise between 1671 and 2020. However, there are two noticeable dips, which can be seen on the line graph below, during the 2008/2009 and 2019/2020 periods.

SADC’s average GDP per capita income over the years 1971–2020

The life expectancy data also shows a similar rising trend. However, there is a more sustained dip in this data in the decade between 1991 and 2000. The line graph on the average life expectancy below shows this trend.

SADC’s combined average life expectancy over the years 1971–2020

One other trend that I found interesting on the life expectancy data from females versus that of males is that females seem to have consistently higher life expectancy in comparison to males. This trend can be seen by comparing the histograms of the life expectancies for females and males for each year. On the images below, the pink bars indicate the female years while the light black / grey bars indicate the male years.

Male versus female life expectancy histogram comparison: Part I
Male versus female life expectancy histogram comparison: Part II

What the data revealed

The results of the analysis revealed interesting answers to the questions that I presented earlier. This section presents some of the findings through, mainly, the use of visualizations.

Is there a relationship between the average GDP per capita income and the life expectancies (combines, male and female) for the SADC region?

For this question, all the line graphs from the EDA section are collated in one visualization so that comparisons can be made. The visualization makes use of two different y-axes, one with the life expectancy in years and the other with the per capita income in dollars.

The lines each depict the average (mean) values for each of the years for all the SADC countries combined.

SADC life expectancy and GDP per capita income over the years 1971–2020

The observations from the visual above seem to generally support the notion that as the average GDP per capita income for SADC rose during the years 1971 to 2020 it was accompanied by a rise in the life expectancies as well. This seems to be true for all the life expectancy indicators that we are tracking, i.e., combined, male and female expectancies.

One exception to this is during the years roughly in the 1991 to 2000 (and into the early 2000’s) where per capita income is rising but the life expectancy indicators for that period are falling.

The other exception is around the years 2019/2020 where average per capita income dips sharply but the life expectancy indicators do not dip as sharply, the combined life expectancy figure does show signs of plateauing during this 2019/2020 but only so slightly.

On average, rising per capita income does seem to be related to the life expectancy indicators for the SADC countries. The few exceptions to this may be explained with other events that were happening in the macro-environment. This will be touched on in the conclusion section.

Is there a relationship between the GDP per capita income and the life expectancies (combines, male and female) for each of the 16 countries in the SADC region?

This section follows the same approach as question 1, but this time I am looking at the indicators country by country (as opposed to taking the average for the entire SADC region)

Looking at the visualisations country by country brings interesting insights into how the individual countries contrinuted to the average trend for SADC in question 1. It is worth noting that although the country by country trend analysis roughly shows the same trend as the average in most cases, there are some countries that exhibit the trend more severely than others, whereas other countries seem to be less impacted. There are also some countries that seem to be trending against the average trend in some indicators or for certain periods during the period under review. A few of these observations are noted below.

The dip in life expectancy indicators between observed during the years 1991 and early 2000’s is more pronounced in countries like Botswana, Zimbabwe, Eswatini, South Africa, and Lesotho; while countries like Seychelles, Comoros, Mauritius, and Mozambique seem to be barely showing this dip.

The per capita income for countries like the Congo Democratic Republic and Madagascar show a falling trend over the entire period. Zimbabwe on the other hand had a sharp dip between 2000 and 2008 and then showed signs of recovery before plateauing in the early years of the decade starting 2011. South Africa on the other hand had it’s dip during the decade starting 1981 into the early 1990's before showing sustained growth until a sharp decline in the years 2019/2020.

Below is a selection of some of the countries with notable trends presented in no particular order.

Botswana
Congo, Democratic Republic
Madagascar
South Africa
Zimbabwe

Is the trend where female life expectancy is higher than the male life expectancy consistent for all 16 SADC countries for each of the years under review (1971 to 2020)?

From the visualizations in the first two questions it seems the female life expectancy indicator figures are generally higher than the male life expectancy figures. This questions investigates if this trend is indeed sustained for every country and every year under review.

The above visualization below confirm that for the period under review, and for every country and every year, the female life expectancy is higher than the male life expectancy. Due to space, only the first year and last year of study results are shown in this report. The full table can be viewed on my GitHub repository.

Female versus male life expectancy, part I
Female versus male life expectancy, part II

How do the figures for the GDP per capita income and the life expectancies for each of the countries compare to the average figures for the SADC region over the review period?

For this last question, I take a deep dive into each of the indicators. I do this by plotting on a single visualization, the values for each of the SADC countries across the years 1971 to 2020 for the specific indicator. I also include the line for the average SADC figure for the indicator so that it is instantly clear how each country is performing against the SADC average for each indicator for each year.

The first visualization plots the per capita income for each SADC country and the average across the years 1971 to 2020.

SADC per capita income for each country over the years 1971–2020

An interesting alternative view of the income graphic showing only countries that are below the average for SADC emphasizes the trends for the low income countries.

SADC per capita income for each low- income country (below the SADC average only) over the years 1971–2020

The next visualization plots the combined life expectancy for each country over the years 1971–2020.

SADC combined life expectancy for each country over the years 1971–2020

Conclusions

Generally, the data does seem to indicate that the SADC countries have generally been on a rising trend with both the per capita income growth and life expectancy. Another conclusion that the data seems to support is that life expectancy and income seem to rise in tandem.

However, there are a few exceptions to these generalizations. Firstly, Madagascar and the Congo, Democratic Republic seem to have falling per capita incomes for the period under review. Zimbabwe is another country that has had sustained periods of decline on this indicator as well, although there is a period of resurgence post-2008 before stalling again around 2012. It will be interesting to dig deeper into these case studies and see what could have caused this.

The dips in the SADC average on the per capita indicator are curiously coinciding with the global financial crisis of 2008 and the global COVID-19 pandemic of 2019/2020. Again, additional data and analysis into this observation may lead to interesting results.

On the life expectancy trend, I was drawn to sustained decline in the life expectancy during the period 1991 to 2000. UNAIDS statistics indicate that this is the period that HIV & AIDS deaths were most prevalent in Sub-Saharan Africa. I would be interested in pursuing future studies into how this may explain this trend in life expectancy.

In conclusion, I believe this project opens up the options of what can be achieved when various indicators are tracked and compared to each other to see if there are any interesting observations. These observations may then lead to more detailed and scientific studies into how certain phenomenon can be explained with data and trends. This information can play a major role in informing governance, policy setting and focus of funding.

Limitations

A few things that I feel may limit the effectiveness in generalizing the observations and conclusions from this project is that the data did not take into account the relative sizes of the populations of each of the SADC countries. An indicator may have different significance when used on a population of a few million people as opposed to tens of millions. Having this context on the population size may add value to the analysis.

Another limitation is that the data gives cross-cutting averages across the entire population which may imply that income, for example, is evenly distributed. This may artificially raise the average income for a country where income is unevenly distributed with only a few earning disproportionately high income.

Acknowledgments:

--

--