Gender gap difference among countries — a look at Brazil

5 min readSep 23, 2022

According to Heloisa Cristaldo, a reporter from Agencia Brasil, “the country has reached the 78th position in the ranking that measures gender equality in 144 countries, according to 2022 Equal Measures 2030 (EM2030) SDG Gender Index, a global report that assesses the evolution of countries in Sustainable Development Goals (SDGs) for the United Nations (UN) 2030 agenda.

“Brazil’s score has reached 66.4 points, behind countries such as Uruguay (31st), Argentina (44th), Chile (49th) and Paraguay (74th). In the previous edition of the ranking in 2019, the country’s position was 77th.”

Here, I worked with a dataset from 2006 to 2013 extracted from Kaggle. It might look too old for you but I also did some extra research about it. I was looking for the causes of gender gap differences among countries, especially in Brazil (77th in 2022 Global Gender Gap Report).

Importing

/kaggle/input/global-gender-gap-index/table-3b-detailed-rankings-2013-csv-2.csv
/kaggle/input/global-gender-gap-index/global-gender-gap-index-2013 - Table 3a - Index 2013 - 2006.csv
/kaggle/input/global-gender-gap-index/table-3a-the-global-gender-gap-index-2013-rankings-2013-2006-csv-1.csv
/kaggle/input/global-gender-gap-index/table-3c-the-global-gender-gap-index-2013-rankings-changes-in-scores-csv-3.csv

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
df = pd.read_csv(r’C:\Users\Lucas\Desktop\global-gender-gap-index-2013.csv’, encoding= ‘unicode_escape’)
df1 = pd.read_csv(r’C:\Users\Lucas\Desktop\detailed-rankings-2013.csv’, encoding= ‘unicode_escape’)
df.drop([“ISO3”, “2012 Countries”, “2013 Rank”, “2012 Rank”, “2011 Rank”, “2010 Rank”, “2009 Rank”, “2008 Rank”, “2007 Rank”, “2006 Rank”], axis=1, inplace=True)
df1.drop([“ISO3”,”Overall Rank”,”Economic Participation and Opportunity Rank”,”Educational Attainment Rank”,”Health and Survival Rank”,”Political Empowerment Rank”], axis=1, inplace=True)
df1.head()

df includes ‘Country’, ‘ISO3’ and Ranks and Scores of each country and categories between 2006 and 2013.

So I dropped rank for a better visualization.

df includes

‘Overall Score’
‘Economic Participation and Opportunity Score’
‘Educational Attainment Score’
‘Health and Survival Score’
‘Political Empowerment Score’

Pairplot

sns.pairplot(df1);

Regplot

import matplotlib.cm as cm

import matplotlib.pyplot as plt

import seaborn as sns

from collections import Counter

import warnings

warnings.filterwarnings(“ignore”)

col=[‘Economic Participation and Opportunity Score’,’Educational Attainment Score’, ‘Health and Survival Score’,’Political Empowerment Score’]

fig = plt.figure(figsize=(12,12))

for i in range(len(col)):

plt.subplot(2,2,i+1)

plt.title(col[i])

sns.regplot(data=df,x=df1[col[i]],y=df1[‘Overall Score’])

plt.tight_layout()

plt.show()

Correlation heatmap

plt.figure(figsize = (8,8))

sns.heatmap(df1.corr(),annot=True, cbar=False, cmap=”YlOrBr”, fmt=’.1f’);

The correlation between ‘Overall Score’ and

‘Economic Participation and Opportunity Score’ : 0.7
‘Educational Attainment Score’ : 0.5
‘Health and Survival Score’ : 0.2
‘Political Empowerment Score’ : 0.8

Boxplot

Principal Component Analysis (PCA)

df2=df1.drop([‘Country’,’Overall Score’],axis=1)

import sklearn

from sklearn.decomposition import PCA

pca = PCA()

pca.fit(df2)

feature = pca.transform(df2)

pca1=pd.DataFrame(feature, columns=[“PC{}”.format(x + 1) for x in range(len(df2.columns))])

pca1.head()

pd.DataFrame(pca.explained_variance_ratio_, index=[“PC{}”.format(x + 1) for x in range(len(df2.columns))]).plot.bar()

According to the analysis, PC1 explains more than half and PC1 and PC2 explain more than 80%.

df1[‘PCA1’]=pca1[‘PC1’]

df1[‘PCA2’]=pca1[‘PC2’]

I categorized countries into 4 categories by the lower percentile 25% and the upper percentile 75%.

def category(ex):

if ex >= 0.717700:

return 1

elif 0.691250 <= ex < 0.717700:

return 2

elif 0.651675 <= ex < 0.691250:

return 3

else:

return 4

df1.loc[:,’category’]=df1.loc[:,’Overall Score’].apply(category)

fig = plt.figure(figsize=(10,8))

sns.scatterplot(data=df1, x=’PCA1', y=’PCA2', hue=’category’);

df1.groupby(‘category’).mean().T

PC1 splits most of the categories around 0. It seems to be influenced by the column called Political Empowerment Score.

After that, I searched into the score and rank of Brazil.

df1[df1[‘Country’]==’Brazil’]

Here, Brazil is the 62nd and it is located in category 2 whose ‘Political Empowerment Score’ is 0.144 and ‘Economic Participation and Opportunity Score’ is 0.6561.

In 2022, Brazil became the 78th and scored 0.664. It is in category 4 and it had lost 0.8 points between 2015 and 2020 and had stagnated, followed by its latin american hermano Argentina (-0.6 points).

The mentioned study suggests that Brazil is not improving over the years when the subject is “gender gap difference”. I can also affirm that “Political Empowerment Score” is a very important data.

I could notice that the average “Political Empowerment Score” of the other countries is also very low:

Category1 : 0.331535
Category2 : 0.177353
Category3 : 0.124218
Category4 : 0.076426

I found out that there are only three european countries which scored more than 0.5 in the “Political Empowerment Score” using the following code.

df1[df1[‘Political Empowerment Score’]>0.5]

Ultimately, Brazil is far from the leaders of the Gender Gap Difference Index. I might suggest that the country improve its “Political Empowerment Score” in order to get closer to countries such as Iceland, Finland and Norway.