Exploring Undernourishment: Part 3 — Data Exploration

A Visual Data Exploration Research Project to Better Understand the Nuances of Our Global Nutrition

Chris Mahoney
The Startup
5 min readOct 13, 2020

--

Image Source: Food and Agriculture Organisation of the United Nations

Contents

This is Part 3 of an 8-Part research project aiming to better understand the nuances of our global nutrition. It explores this topic through the utilisation of data visualisation and data science techniques. It is complimented by a Web App: ExploringUndernourishment, which is freely available to the public.

Part 1 — Introduction and Overview
Part 2 — Literature Review
Part 3 — Data Exploration ← Selected page
Part 4 — Research Area 1: General Trend
Part 5 — Research Area 2: Most Successful Countries
Part 6 — Research Area 3: Surprising Trends
Part 7 — Research Area 4: Most Influential Indicator
Part 8 — Recommendations and Conclusions

Data Exploration

In order to help record and monitor progress towards addressing this goal, the FAO has set up a means of recording and monitoring this data on a yearly basis and has made this data open to the public. Their open-data platform, FAOStat (FAO 2020c) provides this data free to the public within the first couple of months of each year.

A series of visualisations have been created as pare to the Exploratory Data Analysis (EDA) phase.

The Prevalence of Undernourishment Data

Figure 1 shows how the overall distribution of the Prevalence of Undernourishment is centred around 0.05%, with a long right tail out to 0.7%.

Figure 1: Histogram Distribution of the Prevalence of Undernourishment

Figure 2 shows how the metric for the Prevalence of Undernourishment is changing in an incremental, positive manner each year, closer toward zero. There are some deficiencies with a distributed distribution plot like this, namely it is not possible to have a value below zero; yet this plot indicates that this may be possible. Therefore, analysis of these plots needs to be careful, and the audience needs to be cautious about its interpretation.

Figure 2: Ridge Plot of the Prevalence of Undernourishment per year

Figure 3 shows the correlations of each specific feature to the Prevalence of Undernourishment feature. While some show strong correlation (eg. Avg Dietary Adequacy), others show almost no correlation (eg. Prevalence of Breastfeeding Women). Moreover, some show very strong country-specific correlations, but no overall trend correlation (eg. Prevalence of Low Birthrate).

Figure 3: Feature Correlations for the Prevalence of Undernourishment

The Amount of Missing Data

Figure 4 shows how many NA values there are in each variable in the analysis. The reasons for these NA’s may be due to insufficient measuring standards in the listed countries, or it may be intentional omissions of this data, or they may have only just been begun to be recorded in recent weeks. Either way, these missing data values need to be handled accordingly. However, some of these indicators are post-hoc indicators, meaning that these are influenced by, and are resulting from, the Prevalence of Undernourishment. Therefore, they are not good predictors; but they may be good measures after-the-fact.

Figure 4: Lollipop Chart of the Percentage of NA Values per Variable

The Feature Correlations

In Figure 5, each feature is measured with its correlation against each of the other variables. The prevalence_of_undernourishment feature is on the bottom of the plot, making it easy to which features it is highly correlated to (positively and negatively), and which ones it has low correlation. The question marks in this plot indicate where there are too many NA’s in the columns that an accurate pairwise calculation of correlation is not possible.

Figure 5: Correlation Plot for Each Variable

Data Per Country

Figure 6 indicates each country listed on the Y-axis, with the percentage of the number of NA values in the field Prevalence of Undernourishment on the X-axis. Meaning to say that countries with a score for each year of reporting is listed in Green; but countries who have never had a score recorded for the Prevalence of Undernourishment in the 18 years of recording, are listed in Red. Those with zero are not helpful for analysis and should be excluded.

Figure 6: Lollie-Pop Chart for ‘0’ PoU

For those countries that have at least 20% of their years with measurements of PoU (ie. Greater than 4 years of recording), are listed in Figure 7. This plot shows the distribution of PoU for each country. Just like Figure 2, but by country instead. This plot shows that a lot of countries have a very tight, very neat score close to zero; and some other countries have a much broader distribution, further away from zero. Attention should therefore be directed toward those that are broader, and further away, as these would be more helpful for the analysis.

Figure 7: Ridge Plot of PoU per Country

Figure 8 is messy. What it shows is a different line per country, with the PoU on the Y-axis and the year on the X-axis. It indicates that some countries have made substantial improvements in their PoU over the years, and some countries have had a substantial increase in PoU. However, a large proportion of countries are densely packed around the 0 mark, which is congruent with Figure 6.

Figure 8: Line Plot of PoU per Country per Year

References

FAO 2020c, FAOStat, viewed 7 May 2020, <http://www.fao.org/faostat/en/#data/FS>.

Read On:

Previous section: Literature Review
Next section: Research Area 1: General Trend

--

--

Chris Mahoney
The Startup

I’m a keen Data Scientist and Business Leader, interested in Innovation, Digitisation, Best Practice & Personal Development. Check me out: chrimaho.com