Whatsapp Data Analysis
Whatsapp messenger, owned by Facebook, is one of the most widely used messengers in the world.
I was not aware that Whatsapp lets its users get all of the chats data. This blog aims to give out a step by step guide to analysing your WhatsApp chats.
You can check all of my code here.
This notebook is developed for Whatsapp data but not limited to it. You can do the same with Telegram, Facebook Messenger or any other messenger you use.
This project is divided into three main parts.
- Data Collection
- Data Preparation and Cleaning
- Analysing the data using asking some interesting questions
Data Collection
We can easily get our chat data of conversation with anyone.
For iOS:-
- Click on Conversation Name and click on the name of your chat
- Scroll down at the bottom to select the “Export Chat” option
- You will see two options, select “without media”
- It will take some time depending on the amount of messages you have and create a .txt file.
- Send it over your Mac/PC in the same folder of your notebook
For Android:-
- Click on the Conversation Name
- Click on three dots at the upper right corner of your chat
- Click “More”
- Click on “Export Chat”
- Send it over your Mac/PC in the same folder of your notebook after the .txt file format is ready
Data Preprocessing
Pandas have this function, read_fwf() to read in the text file and return pandas DataFrame.
There are clear differences between the formats exported from Android and iOS devices. We need two separate scripts to convert this text to convert this text into Pandas DataFrame in good format.
- Android To DF
2. iOS to DF
I wrote two scripts to convert Android chat data into Pandas DataFrame as well as to convert iOS chat data into Pandas DataFrame. I will be using data exported from iPhone for this project.
To remove Media and Images from dataset
media = whatsapp_df[whatsapp_df['message'] == "<Media omitted>" ]
whatsapp_df.drop(img.index, inplace=True)img = whatsapp_df[whatsapp_df['message'] == "<image omitted>" ]
whatsapp_df.drop(img.index, inplace=True)# Reset the indexes
whatsapp_df.reset_index(inplace=True, drop=True)
Data Analysis
Every data analysis project should start with asking good questions before the beginning of the project. I have compiled my questions at the top before executing any line of code for analysing the data.
Questions
- Who are the different people in the group chat?
- Who are the most active users in the group?
- What is the timeline of data we have?
- What is the most active time of messages in the group throughout the day?
- Which was the busiest month or most active months of conversation?
- Which emoji was used most in messages?
- What are the top words used in the conversation?
1. Who are the different people in the group chat?
2. Who are the most active users in the group?
3. What is the timeline of data we have?
Average messages sent per day?
WOAAH! that is amazing, right?
4. What is the most active time of messages in the group throughout the day?
5. Which was the busiest month or most active months of conversation?
6. Which emoji was used most in messages?
7. What are the top words used in the conversation?
Thank you for reading my blog. I hope you learned something new. There are many ideas like enhancing the graphs and combining some more columns in it like word count or letters count and seeing who types the more messages.
You can connect me over LinkedIn here or take a look at my portfolio website