WhatsApp Group Chat Analysis with Power BI and Excel.
I am really excited to have discovered Data Analytics as a career. “Data is Everywhere”, according to Google Data Analytics Foundation Course.
WhatsApp has been a unique tool for end-to-end encryption and communication for both individuals and businesses. It will be difficult to believe that you can actually extract, Transform, Analyze, and Visualize all chart communication in your group.
Are mine right?
I saw that question in an interview and was shocked to the bone.
The question was “How can you Visualize your WhatsApp group chat?”
I must appreciate Egemolu Goodnews on Medium and LinkedIn. It was his information that inspired my further insights when I saw that question. Thanks, man, and more expansion.
As a Data Analyst, if you hear that question what will you answer and how will you handle it especially if you have not experienced it before?
Dirty Dataset!
Have you seen a dirty Dataset before? If No, then, follow me, I will show you one soon.
In this process, I am going to try as much as I can to be a little more detailed with screenshots so we can understand.
Stage 1: Extracting Data — The first thing is to collect the data from your WhatsApp group chat. From your phone please, NOT computer, open the group chat, click on the three dots on the top right then click “More >”.
Next, click on “Export chat” then select “Without media” then select Send Email.
Send it to your own email address. At this point, you can switch to your computer.
Go back to your email address and download it as a .txt file.
Next, copy it to your Notepad, it will look very, very dirty this… see Screenshot>
This is a practical example of a dirty Dataset.
At this point this Dataset is ‘useless’, so we must clean it up before anything else.
Stage 2: Data Pre-cleaning — This stage is one of the most difficult stages most Analysts don’t like. But you don’t have to like it, you must pass through the process.
Take a look at the screenshot, what did you see? A lot of inconsistent punctuation- comma, dash, colon, semicolon, full stop, and so on. On the notepad, I will use the “Replace” tab from the “Edit” tab, to do some cleaning. Replace comma (,) with a semicolon, replace dash (-) with a semicolon, etc., one after the other.
This process needs a lot of patience and attention to detail.
Stage 3: Excel — Import to Excel Sheet
This time we are going to import the .txt file to our Excel for further cleaning and transformation.
Open a new Excel sheet and click on the “Data” tab, click on “From Text” then select the .txt file from your computer where you saved it.
The window pops shows “Text Import Wizard — Step 1 of 3”
Check the “Delimited” box and click “Next” towards the down right hand.
In the next step 2 of 3, under the “Delimiters”, uncheck the “Tab” box and check the “Semicolon” then click “Next”
Select a Date format from the “Date” drop-down arrow Select “MDY” — “Month, Day, Year” format. Then click on “Finish”.
In the next box that appears click on “Ok”
As you can see, the Data is still very much dirty. No Column header title. It has about 16 Columns and 1047 rows.
Now, Name the Title of the Column as follows: Date, Time, Time 1, User, Message, Message 1, Message 2, etc. But as you can see there is no empty row for that. To do that Highlight the First row, right-click on it, and then select “Insert”. A new row appears, then name the columns.
Step 4: The Power BI Magic — Real Data Cleaning. I will continue the next step using Power BI magical tools. As you may know, Power BI is the elder brother of Microsoft Excel Sheet. Now, save your table and Open it in the Power query editor.
Take a look at the screenshot, it has 16 Columns and 1M plus rows. Look at the Column quality where you have “Valid, Error, and Empty”, non of the Columns is up to 100%. There are Errors and Empty spaces. A whole lot of variables and abnormalities.
Power Query Editor Process
Merge Columns — Merge columns with “null” cell Values, that is from “message” to column 16. I highlighted all the Columns, right-click on the header, and chose “Merge Columns”
Now, we are left with just 5 Columns — Date, Time, Time_1, User and Message. I will use the same process to merge the “Time and Time_1” Columns again the rename them “Time”.
Remove Errors — From the screenshot above you can see the “Valid” in “Users” and “Messages” Columns are not 100% yet. So, highlight both of the Columns, right-click on the header, and select “Replace Errors” and “Replace Empty” one after the other.
Our Dataset has gradually taken shape. As you can see, the Column quality is all 100%. From something that looks like a fatal accident to a final operation. Finally, we have 4 Columns and 650 rows. Ready for insight and Visualization.
Now, either from the “File” menu or the top left side of the screen, click on “Close & Apply”.
5. Report and Dashboard
Power BI Desktop table View screenshot 19>
We are now on Power BI Desktop to complete the Analysis. We have a few questions to answer:
📍What is the Total No. of Users in the group?
📍 What is the Total massage?
📍What is the Peak Chat Time?
📍Who is the Top Contributor in the Group?
📍 What is the Total number of messages sent each day?
📍 What is the No. of Users by Time?
Now, if you don’t know how to create a Report/Dashboard you may have to go to YouTube videos and learn because it’s going to be complicated explaining my process. I will only have to show screenshots with little insight.
📍 First thing here I titled my report as WhatsApp Group Chat Analysis using the “Text Box” tab.
📍I created my measures under the DAX table by first creating my DAX table from the “Enter Data” tab.
I now created a New Measure to calculate the total number of users. Thus the calculation: Total No. of Users= DISTINCTCOUNT(WhatsApp_Scrap[Users]) = 7
In the same way, I created another Measure for the Total No. of Messages as 532
Individual Reports
📍Who is the Top Contributor in the Group?
📍What is the Peak Chat Time?
📍 What is the No. of Users by Time
📍 What is the Total number of messages sent each day?
Dashboard
Finally Recommendation
I will encourage every Data enthusiast to practice Data Scraping of any group that you belong to, but make sure to ask for permanent. On a second note, practice that with Power BI because it has some hidden magic to discover only by practice.
The Top Contributor from the chat is the person I named Mr. Bassy, who did 188 messages in the group followed by Emmanuel with 153 messages, and so on.
Peak Chat Time is between 12:26 pm and 5:30 pm. This simply means that if you are in the group and you want your message to be read you must send your contribution within this time range.
The total number of Messages sent each day is 125 messages were sent on 16/09/23, followed by 59 on 4/10/23, and gradually it is reducing by the day.
Thank you!