Analyzing and Cleaning the Global Hunger Index Data Using Python

Madhumita Chaudhary
5 min readJul 14, 2023

--

Introduction

Hello everyone! Today, I’ll be sharing my experience of working with the Global Hunger Index (GHI) dataset and how I cleaned and prepared it for further analysis. The GHI is a tool designed to comprehensively measure and track hunger at global, regional, and national levels. It’s a complex dataset with its unique challenges, and I hope my journey provides you with some valuable insights on conducting a comprehensive EDA using Python.

Getting Started

The first step was to load the dataset into a pandas DataFrame. The dataset was a CSV file, and I used the pd.read_csv() function to load it. Here's how the data looked initially:

Cleaning the Data

The dataset had several columns named ‘Unnamed: X’, which did not contain any useful information. I decided to drop these columns to clean up the dataset:

Adding Regional Information

The original dataset primarily contained country-level data. To enable a more detailed and region-specific analysis, I decided to enrich the dataset by adding corresponding regional information for each country. This additional layer of information allowed me to analyze the hunger index not only at a country level but also understand broader regional trends and patterns.

To address this, I decided to enrich the dataset by mapping each country to its corresponding region. I created a dictionary where the keys were country names and the values were the respective region names. Using the map() function in pandas, I added a new column to the dataframe named 'Region' for each country.

Handling Non-Numeric Values

The goal behind adding the regional information was twofold. First, it allowed me to analyze the data at a more granular level. Second, it enabled me to estimate the median or mode GHI for each region and use these values to impute the missing entries or ‘<5’ values for the countries associated with these regions.

This approach aimed to provide more region-specific estimates for the GHI values, rather than choosing an arbitrary value below 5. However, this approach did not yield satisfactory results. After referring to the GHI scale, which categorizes any value below 9.9 as ‘low’, I decided to replace all ‘<5’ values with 4.9 to maintain the integrity of the data while also providing a specific value for analysis.

Categorizing Severity

Finally, I categorized the hunger index into different severity levels (low, moderate, serious, alarming, extremely alarming) based on the score ranges. This allowed me to analyze the severity of hunger over time and across different regions and countries.

Saving Cleaned & Processed Data

In the process of cleaning and pre-processing the Global Hunger Index (GHI) data, I saved the relevant columns to a new dataframe for further analysis. The dataframenew_df:

After saving the cleaned data, I checked for missing values. Interestingly, I found a few rows with hunger indices but no associated country data. Since these indices are meaningless without a corresponding country, I decided to drop these rows.

With the cleaned dataset in hand, I was ready to dive into the analysis. Here’s a snapshot of what I did:

Trend Analysis: I plotted the average hunger index change over the years. This gave me a clear picture of how the global hunger situation has evolved over time.

Regional Analysis: I calculated the average hunger index by region for the year 2022. This helped me understand which regions are currently facing the most severe hunger problems.

Comparison: I compared the average hunger index by region for all years. This allowed me to see how different regions have progressed in their fight against hunger.

Severity Analysis: I analyzed the severity of hunger over time. This revealed how the severity of hunger has changed globally over the years.

Analysing Improvement: I analyzed the improvements over time. This helped me identify which countries have made the most progress in reducing their hunger index.

Conclusion:

In conclusion, this analysis of the Global Hunger Index data revealed some encouraging trends but also highlighted areas where much work still needs to be done. As a data analyst, it’s been a rewarding experience to uncover these insights and I believe there’s still more to explore, perhaps through unsupervised learning techniques in future projects.

To delve deeper into my analysis and explore the insights I’ve uncovered, click here.

--

--