WhatsApp Chat ๐Ÿ“ฑ โ€” Analyze ๐Ÿ”, Visualize ๐Ÿ“Š

HARSH SINGHAL
Analytics Vidhya
Published in
5 min readAug 9, 2020

WhatsApp is the most preferred messenger app in the world today and has more than 2B users worldwide. More than 65B messages are sent daily.

Hey there! I am using WhatsApp.

In this article, I will present you with interesting analysis and visualization that could be done on any WhatsApp chat.

The full script for this exercise can be obtained here:

Analyzing our own data is so much fun! Believe me!

Getting WhatsApp Chat

WhatsApp has a functionality that enables you to download the conversation logs of individual and group chats.

iPhone: Open chat | Tap on name | Scroll down | Export chat โžž Text file

Android: Open chat | Tap more options โ‹ฎ | More | Export chat โžž Text file

Data Preparation

Chat data is in a semi-structured format. Therefore, we need to convert it into a structured format to enable us to analyze and visualize data in a more interpretable way.Each line in a text file follows a specific format:
[date, time] Author: message
Using RegEx, we will parse the text file and convert it into a pandas dataframe.

Code

Text file โžž Pandas dataframe (Using RegEx)

Exploratory Data Analysis (EDA)

First, some basic statistics

How many messages have been exchanged?
How many authors are there?
What is the average number of messages exchanged every day?

Analysis 1: Datetime

When was the group most active?
Which day of the week, part of the day, an hour of the day has the most number of messages exchanged?
Right: Weekends ๐Ÿ•บ
Left: Owls ๐Ÿฆ‰๐Ÿ‘ป | Right: Friday night ๐Ÿ™ˆ

Analysis 2: Author

Who is the most talkative?
Whose messages are decreasing with time?
Who sends long messages?
Left: Overall | Center: Author 1 | Right: Author 13

Analysis 3: Messages

What are the most commonly used words in the messages (overall, author-wise)?
Left: Overall | Center: Author 2 | Right: Author 5

Analysis 4: Emojis

What are the most commonly used emojis (overall, author-wise)?
What is the emoji-to-message ratio for the author?
โ€Ž
Left: Overall | Center: Author 1 | Right: Author 6

Analysis 5: Subject

What is the most common subject of the group?
The 13 Primes!!

Analysis 6: Activeness

How many days the group was silent?
Who is the most active author?
โ€Ž
โ€Ž

Analysis 7: Messages Deleted

Which author has deleted the most number of messages?

Analysis 8: Interactions

Whom the author has replied the most?
Who are the top responders to that author?

Analysis 9: Sentiment

Who is the most positive author?
When was the group or the author most happy?
Left: Stats | Right: Example
Left: Overall | Center: Author 2 | Right: Author 10

Thanks for reading this article! Feel free to leave a comment below if you have any questions.

--