Whatsapp Group Chat Analysis (Python)

Brenda Leyva
MCD-UNISON
Published in
7 min readOct 5, 2020

This analysis is based on the story Whatsapp Group Chat Analysis using Python and Plotly by Saiteja Kura

The following steps were taken from the analysis by Saiteja Kura and modified as needed. The result is the same analysis with some added features, hope this is useful. The activity as a whole is a lot of fun and can give you a hint about how is the overall mood of your group chat.

1.Exporting your group chat: Whatsapp provides a feature of exporting any chat (with or without media) as a .txt file.

Image link

2. The plain text file needs to be parsed and tokenized, further details on this process are presented on the original story. The following code worked wonders for me, this code chunk will take care of parsing and tokenizing the text as well as putting it back together. I have added the option of the file as input so you can analyse as many group chats as you want in an easier way.

This code will first request the file name and then continue on to give the data structure.

3. Create the data frame: Now, the following code will take care of using the first code chunk and placing all the data into a data frame format and changing the names of users for privacy.

After running this you should get some details on the data frame and the first few lines of the final product with the names changed.

The code has correctly identified the Date, Time, Author and Message.

4. Some group stats: We can now analyze the information we have, some general stats we can look into are the total amount of messages that have been shared with the group, how many media items, how many emojis and how many links in total. The following code will take care of that:

The group has shared a total of 14,894 in this case. The interesting part is that this was one of the less busy groups I could find so exporting wouldn’t take too long. So imagine the amount of messages that are being kept on one of those chats you use on a daily basis!

5. More stats: These kinds of stats can be obtained for each user

These stats are a lot of fun, you can start a bet among friends on who they think sends more messages, or who is the emoji addict, why not make a game out of it?

6. Emojis used: According to the stats there’s a certain amount of emojis that have been sent, but there’s a range of emojis used in the conversation, how many are there?

This gives us a hint at the fact that even though we send in a lot of emojis, there’s only a few on rotation.

7. Popular emojis: We can generate the list of emojis used and see which ones are more popular, which ones were rarely used.

For this group the laughing emoji seems to be way more popular than any other.

8. Pie chart of emojis: A nice way of presenting the findings regarding emojis is by creating a pie chart that shows how much they are used.

After working with a few group chats, it was interesting to find that the bigger part of the pie was always taken by the laughing emoji, I guess we can say my group chats are quite happy.

9. Individual pies: Even better, we can take a look at the individual pie charts of each user.

These can be used for a new bet among friends and family, you can now have proof that there’s a certain person who is constantly using the same emoji, all the time.

10. Word cloud: Creating a word coud one can get a feel for the general vibe of a group chat, this is a very interesting way of looking at the information that is being exchanged. The following code creates the word cloud and I have added the optional chunk that saved the wc as a jpg image, this can be updated to be any other format. Note that the group chat we are working with has some words in english and mostly spanish and this can be indicated for the stopwords part.

The world cloud is such a pretty thing to look at, and also the best way to portray the general mood of the group and how positive or negative they tend to be. There’s some biase with that kind of analysis but it’s worth exploring.

11. Messaging evolution: A very useful part of the analysis is being able to look at how the messaging is changing from a certain time to now. The following code show us how much or how little the group chat has been used starting january 2020.

It raises interesting questions as to what may have been going on during that notorious spike.

12. Dates with higher number of messages: We can find the total number of times a date occurred in the data frame in the descending order, taking the first few, the following code will do just that:

These will reflect the spike that was seen on the previous chart and confirm it.

13. Most active days: A question that can be answered is, what day of the week tends to have more activity for the group chat? With the following code we can find this out creating a polar plot that will allow us to interact with the data and figure out the days that are most active and about how many messages are sent on average on those days.

The group chat that was chosen has a friendship context and it makes sense that the days with more activity will be weekends, however there’s also a tendency to message more on monday and wednesday than on tuesdays and thursdays. That would certainly be a topic for conversation.

14. Most active times: We could also present some information about the times of the day where the group has been more active. A simple plot can show us the times that had the higher amount of messages.

The analysis ends here, with the busiest times of the day on average for the group, they seem to be around noon and the afternoon, maybe there’s a break around lunch time were the group communicates the most.

Overall this exercise is a fun way to practice and gain some python skills. On a personal level it was very interesting to think about some of the findings and discussing them with friends. From a data science student perspective, the whole activity was an amazing way to visualize the kinds of things we can do with the different tools we have and a great way to practice our analysis skills. I highly recommend trying this out for yourself.

Thanks to Saiteja Kura for making the original post available, please go check it out for further details on the code chunks presented here.

--

--

Brenda Leyva
MCD-UNISON

Former business administration professional turned physicist, turned data scientist with a unique approach to problem solving and data analysis.