Never Ask a Woman Her Age?
Testing the Veracity of a Classic Adage
A social development and social justice oriented telecommunications company known as Viamo was conducting a data analysis of the demographics of users of their Interactive Voice Response (IVR) service in the country of Mali. Unfortunately, almost half of all survey respondents refused to divulge their age and almost the same amount of that refused to divulge their gender. This investigation was conducted to determine whether there is a strong correlation between gender and reluctancy to reveal age, as is commonly believed.
from google.cloud import bigquery
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'viamo-datakind-19b12e3872f5.json'
bq_client = bigquery.Client()
pf = ParquetFile('Viamo_user_data_Mali.parquet')
df = pf.to_pandas()
(df.isnull().sum()/len(df)*100).sort_values(ascending=False)
The result of this code block is the following output:
age 49.83
gender 43.39
second_fav_topic 28.03
second_fav_theme 13.01
subscriber_id 0.00
fav_topic_numeric 0.00
second_fav_theme_numeric 0.00
fav_theme_numeric 0.00
fav_topic 0.00
age_numeric 0.00
fav_theme 0.00
gender_numeric 0.00
median_call_duration 0.00
n_topics 0.00
n_themes 0.00
n_calls 0.00
second_fav_topic_numeric 0.00
dtype: float64
Testing the hypothesis whether women are more reluctant to admit age, despite the survey being anonymous:
# Percentage of men who decline to state their age:
age_shy_xy = [len(df.loc[(df.gender=='male') & (df.age.isna())]),
len(df.loc[df.gender=='male']) - len(df.loc[(df.gender=='male') & (df.age.isna())])]
# Percentage of women who decline to state their age:
age_shy_xx = [len(df.loc[(df.gender=='female') & (df.age.isna())]),
len(df.loc[df.gender=='female']) - len(df.loc[(df.gender=='female') & (df.age.isna())])]
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.pie(age_shy_xy, colors=['#c44e52','#55a868'], labels = ["No", "Yes"], autopct='%.0f%%')
plt.title('Portion of Malinese Men Admitting Age')
plt.subplot(1, 2, 2)
plt.pie(age_shy_xx, colors=['#c44e52','#55a868'], labels = ["No", "Yes"], autopct='%.0f%%')
plt.title('Portion of Malinese Women Admitting Age')
It can be proven true, but only by a hair.
The original source code for producing the visualizations in this article can be found here.