Humble Bumble — Data Analyst Interview Challenge for Data Queens🐝🍯
Humble Bumble — Data Analyst Interview Challenge using Python, Pandas, and Matplotlib 🐝🍯
Dear 👋💻🌎 Data Queen,
MAKING LEARNING HOW TO CODE
✧・゚:* CUTE(◕‿◕✿) and INFORMATIVEᕙ(⇀‸↼‶)ᕗ!!!
Question 1: Please complete the below shell function so that, given a string s, it will count the number of unique words, which is case insensitive and ignores punctuation.
- The answer should be printed and should be printed in alphabetical order.
- No libraries outside of the python standard libraries can be used (ie, no pandas, no sklearn, no nltk etc).
Example:
“I’m smart, I’m educated. It would have been a disservice to every woman to go away or hide.” — Whitney Wolfe Founder of Bumble
Input: "I'm smart I'm educated. It would have been a disservice to every woman to go away or hide." Ouput:
[
('a', 1),
('away', 1),
('been', 1),
('disservice', 1),
('educated', 1),
('every', 1),
('go', 1),
('have', 1),
('hide', 1),
("i'm", 2),
('it', 1),
('or', 1),
('smart', 1),
('to', 2),
('woman', 1),
('would', 1)
]
Code:
punctuations = [',', '.', '!', '"', '?']def word_count(s):
sentence = s.lower()
for punctuation in punctuations:
words = sentence.replace(punctuation, '')
word_list = words.split()
word_dict = {word : word_list.count(word) for word in word_list}
return sorted(word_dict.items())
Question 2: Using the given pandas dataframe, please calculate the ratio of messages sent to messages received (messages_sent / messages_received) split by country and gender, and visualize this in a way that is easy to digest and understand. Please use any libraries you wish.
Step 1: Cleaning the NaNs with Zeros
# Filling NaNs with zerosmessages_df = messages_df.fillna(0)
messages_df
Step 2: Creating a grouped table for Messaged Received
Code:
# Creating a Grouped table for Messaged Receivedtotal_messages_received_df = messages_df.groupby(['country', 'gender']).\
messages_received.\
sum().\
to_frame().\
reset_index().\
rename(columns = {'': 'messages_received'})total_messages_received_df
Step 3: Creating a grouped table for Messaged Sent
Code:
# Creating a Grouped Table for Messaged Senttotal_messages_sent_df = messages_df.groupby([‘country’, ‘gender’]).\
messages_sent.\
sum().\
to_frame().\
reset_index().\
rename(columns = {‘’: ‘messages_sent’})total_messages_sent_df
Step 4: Merging the 2 Tables
Code:
# Merging the two tablesbumble_df = pd.merge(total_messages_received_df, total_messages_sent_df, how = 'outer', left_on = ['country', 'gender'], right_on=['country', 'gender'])bumble_df
Step 5: Calculating Message Ratio
message ratio = (messages sent) / (messages received)
# Calculating the Messaged Ratiobumble_df['messages_ratio'] = (bumble_df['messages_sent'] / bumble_df['messages_received']) * 100bumble_df
Step 6: Analysis of the Data
- French males have the highest send/receive ratio with 92 messages sent and only 11 received back.
- French females and UK males seem very popular with ratios of 26% and 30%.
- Both have received a lot more messages than they have sent.
Step 7: Plotting the Data
# Plotting the data on a chart
c = 3# Converting Ration to percent
female_ratio = list(bumble_df[bumble_df["gender"] == "F"].messages_ratio)male_ratio = list(bumble_df[bumble_df["gender"] == "M"].messages_ratio)country = np.arange(c)
width = 0.4
fig = plt.figure(figsize = (8, 8))
ax = fig.add_subplot()bar_1 = ax.bar(country, female_ratio, width, color = 'gold', label = 'Female')bar_2 = ax.bar(country + width, male_ratio, width, color = '#F9B007', label = 'Male')ax.set_ylabel('Percent')
ax.set_xticks(country + width / 2)
ax.set_xticklabels(['BR', 'FR', 'UK'])
ax.legend((bar_1, bar_2), ('Female', 'Male'), loc = 'upper right')
ax.set_title('Bumble Ratio of messages received by country and gender')
Thank you for reading my Data Journey ❤ ,
Kody the Coding Corgi & Bits the Adorable A.I.
P.S.
If you enjoyed this comic strip and could help you in any way, sign up for our newsletter, or buy me a boba, which means a lot, and send your thoughts and feelings about this work.
Are you interested in collaborating? Follow us on LinkedIn.
D.M. us on Instagram or tweet us on Twitter or connect us on LinkedIn.
Please share this with your data friends, corgis friends, and coding corgis friends so we can make more comics in the future with your support. Thank You!