Humble Bumble — Data Analyst Interview Challenge using Python, Pandas, and Matplotlib 🐝🍯

Question 1: Please complete the below shell function so that, given a string s, it will count the number of unique words, which is case insensitive and ignores punctuation.

  • The answer should be printed and should be printed in alphabetical order.
  • No libraries outside of the python standard libraries can be used (ie, no pandas, no sklearn, no nltk etc).


“I’m smart, I’m educated. It would have been a disservice to every woman to go away or hide.” — Whitney Wolfe Founder of Bumble

Input: "I'm smart I'm educated. It would have been a disservice to every woman to go away or hide." Ouput: 
('a', 1),
('away', 1),
('been', 1),
('disservice', 1),
('educated', 1),
('every', 1),
('go', 1),
('have', 1),
('hide', 1),
("i'm", 2),
('it', 1),
('or', 1),
('smart', 1),
('to', 2),
('woman', 1),
('would', 1)


punctuations = [',', '.', '!', '"', '?']def word_count(s): 
sentence = s.lower()
for punctuation in punctuations:
words = sentence.replace(punctuation, '')
word_list = words.split()
word_dict = {word : word_list.count(word) for word in word_list}
return sorted(word_dict.items())
Question 2: Given a pandas data frame calculate the ratio of messages sent to messages received.

Question 2: Using the given pandas dataframe, please calculate the ratio of messages sent to messages received (messages_sent / messages_received) split by country and gender, and visualize this in a way that is easy to digest and understand. Please use any libraries you wish.

Step 1: Cleaning the NaNs with Zeros

# Filling NaNs with zerosmessages_df = messages_df.fillna(0)
Cleaning the NaN with zeros

Step 2: Creating a grouped table for Messaged Received

Creating a grouped table for Messaged Received


# Creating a Grouped table for Messaged Receivedtotal_messages_received_df = messages_df.groupby(['country', 'gender']).\
rename(columns = {'': 'messages_received'})

Step 3: Creating a grouped table for Messaged Sent

Creating a grouped table for Messaged Sent


# Creating a Grouped Table for Messaged Senttotal_messages_sent_df = messages_df.groupby([‘country’, ‘gender’]).\
rename(columns = {‘’: ‘messages_sent’})

Step 4: Merging the 2 Tables


# Merging the two tablesbumble_df = pd.merge(total_messages_received_df, total_messages_sent_df, how = 'outer', left_on = ['country', 'gender'],  right_on=['country', 'gender'])bumble_df

Step 5: Calculating Message Ratio

message ratio = (messages sent) / (messages received)
# Calculating the Messaged Ratiobumble_df['messages_ratio'] = (bumble_df['messages_sent'] / bumble_df['messages_received']) * 100bumble_df

Step 6: Analysis of the Data

  • French males have the highest send/receive ratio with 92 messages sent and only 11 received back.
  • French females and UK males seem very popular with ratios of 26% and 30%.
  • Both have received a lot more messages than they have sent.

Step 7: Plotting the Data

# Plotting the data on a chart
c = 3
# Converting Ration to percent
female_ratio = list(bumble_df[bumble_df["gender"] == "F"].messages_ratio)
male_ratio = list(bumble_df[bumble_df["gender"] == "M"].messages_ratio)country = np.arange(c)
width = 0.4
fig = plt.figure(figsize = (8, 8))
ax = fig.add_subplot()
bar_1 =, female_ratio, width, color = 'gold', label = 'Female')bar_2 = + width, male_ratio, width, color = '#F9B007', label = 'Male')ax.set_ylabel('Percent')
ax.set_xticks(country + width / 2)
ax.set_xticklabels(['BR', 'FR', 'UK'])
ax.legend((bar_1, bar_2), ('Female', 'Male'), loc = 'upper right')
ax.set_title('Bumble Ratio of messages received by country and gender')

