WhatsApp Chat Analysis

Abubakar Alaro
Geek Culture
Published in
4 min readSep 26, 2021

The ability to extract information and insights from any form of data is considered a valuable skill. Data can be in various forms like structured tabular data, text data, etc. WhatsApp is a type of social media platform that allows it users to send messages, pictures, videos etc to each other. Sending these messages generate text data which can be extracted easily and with knowledge of a programming language like python, can be transformed and analyzed.

In this article, I will explain how to use WhatsApp chat text data to extract insights and create different charts.

Image by Author

Objectives:

After reading this article, reader will be able to:

  1. extract WhatsApp chat data
  2. clean text data using regex
  3. extract insights from text data
  4. use Streamlit to build a dashboard

Without any further ado, let’s get to it

Outline:

  • Extracting WhatsApp chat data
  • Cleaning & Transforming text data
  • Building Visuals with Streamlit

Extracting WhatsApp chat data

WhatsApp allows its users to download their chat data either a Direct Message (DM) or a group chat. Follow the following steps to download your chat data

  • Open the chat that you wish to download, it could be a DM with your friend or a group chat
  • Click on the 3 dots vertical line
  • Click on the More button
  • Click on the Export chat button
  • Click on Without Media — this will only export text chat data and exclude all media like audio, videos, etc.
  • You can then send it to any destination you want. You can also mail it to yourself

Cleaning & Transforming WhatsApp data

To extract any useful information or insights from data, it needs to be cleaned and transformed with methods that conforms with what is required in terms of insights that is to be extracted. WhatsApp chat data is no exception, it is dirty and needs to be cleaned.

  • Remove system generated message: WhatsApp generates some messages like when someone new joined a group, it sends a message like ‘user1 just joined the group’
  • Remove Emojis
  • Extract useful features from each message: information like time, date, message content, message sender, etc.
  • Combine all features into a dataframe to aid visualization

Here is a function that extracts features and converts all features into a dataframe.

Building Visuals with Streamlit

Streamlit — Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in hours, not weeks. All in pure Python. The goal of streamlit is to create interactive apps for your data.

After Cleaning and Transforming the chat data, I have a dataframe that looks like this:

Image by Author

Now, I will move to build visuals that communicate insights that are present in the data that we have. Here, I will think about what is important to display and the most efficient way to communicate such insight. Here is a function that generates a word cloud image

Other functions to generate charts can be found here. The entire codebase is based on streamlit for frontend design of the data view and plotly, which offers chart interactivity. I have created multiple charts to convey different insights like the total number of messages sent by a single user, the busiest days in terms of message sent, the most common words used in a chat, etc. Below is a snapshot of a chart that was generated based on the WhatsApp chart data.

Image by Author
Image by Author

The entire codebase for the project can be found here. I will write about deploying the app and also adding more functions to it like;

  • A download button to download cleaned chat data for other Machine Learning use cases
  • A sidebar to load additional data and compare metrics
  • Add what you would like to be added to the app in the comment section

Thank you for reading, and cheers

--

--