Advance Your Python Skills by Building a WhatsApp Chat Analyzer

A guided project that helps you dive into creating something cool and learn useful programming concepts by yourself!

Nityesh Agarwal
Nov 18 · 14 min read

Finding ways to apply your knowledge only after the learning process essentially means that the learning happened without much of a sense of destination. All we were trying to do was amass all the knowledge we could in the hope that it’d come of use in some distant, mystical future.

Doesn’t that feel like procrastination?

I believe in an approach to learning that emphasizes doing projects.

When you try to make something, you discover a hundred things you don’t know. You discover things you thought you knew but don’t really know. You trip over things that seemed so simple you didn’t even pay attention to them. You fill the gaps in your learning.

Also, it’s super fun and adventurous.

You can get all that only if you do a project. So I think that it’s worth it to center your entire learning around completing a project.

If you want to dive into building something interesting and learn useful Python/programming skills along the way, this guide is for you.

With this guide, I aim to walk you through building something interesting, allowing you to experience difficult-to-grab programming intuitions as you build it. Hopefully, you go from a basic Pythonista to an advanced one.

But, most importantly, I want to give you motivation and the incentive for you to teach yourself.

What you’ll learn

Here are some textbook skills that you’ll pick up:

  • File handling
  • String operations in Python
  • Functions
  • Modules
  • pip and using third-party packages
  • Regular expressions (RegEx) in Python

But this isn’t a textbook. So along with them, you’ll also develop intuitions about good programming practices like:

  • The importance of readability of your code and coding style
  • When and how to break your code into functions
  • How to go about debugging your code (when you want to bang your head against the wall instead)
  • How to look things up on the internet — use Google, use Stack Overflow, read documentation, etc.
  • Understand the need for different data structures and when to use what

Let’s get to it then.

Q: “All right, what am I building?” 😃

OK, so here’s the idea:

When chatting with a close friend, have you ever wanted to know:

  • The number of messages sent by each of you
  • The average length of your messages
  • Who texts first and the first text in each conversation
  • Your chatting time patterns — hourly, daily, and monthly
  • Most shared website links
  • Most common words that each of you use

Wouldn’t it be cool if you wrote a program that’d just calculate all this stuff for you?

Q: “But how cool is it really?” 😒

Reddit says that it’s “14k points” cool!

Your program is going to find similar results and print them for you without those graphs and visuals.

Q: “Cool! But am I ready?” 😳

“Every great developer you know got there by solving problems they were unqualified to solve until they actually did it.”

— Patrick McKenzie

Thinking along these lines, I believe that:

  • If you know the basics of the following in Python — variables, lists, dictionaries, loops, conditions, functions — you’re ready.
  • Otherwise, if you are new to Python but know the basics in some other language, go through this quick Python tutorial — and I think you’ll be ready.

Just dive into the first “Hello, World”-equivalent exercise below. If you can complete it, you’re ready.

Q: “And how will I build it?” 😕

WhatsApp allows you to export any chat into a text file that looks something like this:

So you can write a program that’ll read this chat file, parse it, analyze it, and give you the results.

But that’s not enough help, right?

“OK, let’s do it then.” 😃

A Roadmap for Building a WhatsApp Chat Analyzer

MS0: Set up your environment

When you’re starting out, you don’t want to spend hours setting up your environment. Half of your motivation gets killed right there, right?

Repl.it is the way out of this setup frustration.

It’s a website that provides an online IDE for almost every language, which you can access for free with just a few clicks. It’s great for small projects like the one we’re building.

MS1: An assurance that things work (the “Hello, World!” equivalent)

Every programming book/tutorial ever starts out with a “Hello, World!” program. Why is it so?

Apart from being welcoming to newcomers, this program does the job of reassuring the learner that her environment is set up and that things work. So if she does it right, her program will work too.

With these goals in mind, here is your “Hello, World!”-equivalent program:

Print "I love you 3000" 3000 times. (Any Marvel fans out there?)

This is a good opportunity to go deep and:

  • See if you’re ready to dive deeper into the project

If not, then it’s time to do the basics of Python. Don’t worry, it isn’t too difficult.

MS2: Read your chat file using your Python program

Here onwards, you will build a piece of the project with each chapter.

There are 2 files that you will need for the project -

  1. Your Whatsapp chat file (ending in .txt)
  2. A Python code file (ending in .py)

Once you have them, this first chapter requires you to open the chat file using your Python program and print all of its contents.

This is a good opportunity to go deep and:

  • Understand how to handle files with Python

File handling in Python — from Zero to Hero:

You know that any editor that you use to open a text file on your computer (Notepad, VS Code, Vim, etc.) is a program, right?

You know what? — you can make your own Python program do that. Almost easily!

Go through this excellent tutorial by Real Python to learn the concepts of file handling in Python.

MS3: Features #1 and #2 — count the total number of messages and total number of words

Count the number of messages you and your friend have exchanged.

Then, count each of your individual share — both according to the number of messages and the number of words.

Print the results.

This is a good opportunity to go deep and:

  • Understand strings in Python

Important things to remember about strings in Python:

  • Strings are treated as lists. So you can do a search like this:
if "- Paridhi:" in chat_line: 
counter+=1
  • Python strings are famous (as compared to the ones in other languages) because Python powers them with a rich library of in-built methods you can use to perform operations on them. I suggest you use this tutorial by W3Schools as your reference material for those methods.
  • Python’s ability to slice and negative index strings can be really handy at times

Caution: Now onward, you will feel your program grow in size and complexity. As it does so, you should start getting conscious about your coding style, and keep the readability of your code in mind.

Coding style and readability of code:

Brian Kernighan says in his book “The Practice of Programming:”

The purpose of style is to make the code easy to read for yourself and others, and good style is crucial to good programming.

Personally, whenever I try to take decisions about the readability of my code, this line from “The Zen of Python” plays in my brain:

“Explicit is better than implicit.”

Here are three simple, actionable rules you can keep in mind to develop a good coding style:

1. Put some thought into choosing your variables’ names

I find Brian Kernighan’s advice really helpful here:

  • Global functions, classes, and structures should have descriptive names that suggest their role in a program
  • By contrast, shorter names suffice for local variables; within a function, n may be sufficient, npoints is fine, and numberofPoints is overkill
  • Local variables used in conventional ways can have very short names. The use of i and j for loop indices, p and q for pointers, and s and t for strings is so frequent that there’s little profit and perhaps some loss in longer names.

2. Use functions wherever necessary

  • Break long pieces of code into functions
  • Don’t repeat yourself (DRY) — use functions to remove duplicate pieces of code

More on functions in the next chapter.

3. Write helpful comments

  • Comments are meant to help the reader of a program. They don’t help by saying things the code already plainly says or by contradicting the code — or by distracting the reader with elaborate typographical displays.
  • As much as possible, write code that’s easy to understand; the better you do this, the fewer comments you need. Good code needs fewer comments than bad code. Comments are, at best, a necessary evil.
  • Don’t contradict the code. Most comments agree with the code when they’re written, but as bugs are fixed and the program evolves, the comments are often left in their original form, resulting in disagreement with the code.

In the end, remember that the principles of programming style are based on common sense guided by experience, not on arbitrary rules and prescriptions.

MS4: Feature #3 — calculate the average length of messages sent by each party

Now, that you’ve calculated your individual share using two metrics — message count and word count — you can use it to calculate each of your average length of messages. Print the results.

This is a good opportunity to go deep and:

Understand functions as a means to:

  • Reduce repetition
  • Make code more readable

Deep dive into using functions — motivation and style:

Duplication may be the root of all evil in software. Functions were one of the first techniques developed to control this evil.

It’s easy to understand the syntax of writing functions, but it takes practice and some sense of design to learn when to break the code into functions. One goal is to design functions such that they can be reused when extending your program to new cases.

What more? Making such design choices are what makes programming fun.

Here are three heuristics from Bob Martin’s book “Clean Code” that’ll guide you while making such choices:

  1. Functions should be small. How small? No more than a screenful — or 20 lines.
  2. Functions should have descriptive names. The smaller and more focused a function is, the easier it is to choose a descriptive name. Don’t be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment.
  3. Functions should do only one thing and have no side effects — its intent should be clear from its name.

When you first write a function, it’ll probably come out long and complicated and not follow any of the above rules. And that’s OK. You can refine and reformat your code later. I don’t think anyone could start with writing functions that follow all the rules mentioned above.

Remember these are function-building goals that you need to strive toward. Don’t let them paralyze you.

MS5: Feature #4 — count number of first texts, and show them

Do you want to resolve the question of who texts first once and for all?

After this milestone, you will.

You’ll know exactly how many conversations each of you have initiated and have a list of those first texts. Print all that out.

This is a good opportunity to go deep and:

  • Understand modules — you’ll need Python’s time module here
  • Learn how to look things up and read the documentation

Caution: Don’t be intimidated by the docs. They’re your friends.

What are modules?

Every file of Python source code whose name ends in a .py extension
is a module.

Python installation comes with a standard library that contains such modules out-of-the-box. These are useful pieces of code you don’t have to write.

MS6: Feature #5 — chatting time patterns (hourly, daily, and monthly)

Now, its time to find out your usual chatting patterns.

* What hour of the day do you chat the most? What about the rest of the hours?

* Which day of the week do you usually chat the most? What about the rest of the days?

* Which month have you chatted the most? What about the rest?

Print the results.

This is a good opportunity to go deep and:

  • Understand the need for different data structures for storing all this data and think upon how to design a data structure to suit your needs

Note: You’ll need the time module again here. It's important for you to know it's OK if you don't remember it; you’re allowed to use Google and check the documentation as many times as you need.

Caution: Implementing this can be quite tricky. You’re likely to spend a majority of your coding time banging your head over broken code.
Remember: It’s not the computer, but your code that’s at fault.

How to debug your code:

  1. Explain the code to a friend or use the rubber-duck technique
  • Pick a friend (or a rubber duck)
  • Open the problematic code, and explain it to him (/her/it), line by line, slowly and patiently
  • Find the problem staring at you, in your face, without any help of your friend (or the duck), as if by magic

2. Add print statements

Although adding such print statements isn’t the correct way to debug, I find them incredibly effective at times — especially, when I’m working with a text editor like VIM and not on a full-fledged IDE that has a debugger (or when you’re too lazy to learn how to use a debugger.

But I have to say, once you learn how to use an IDE debugger, there’s no going back.

3. Use an IDE debugger

As of writing, repl.it doesn’t fully support a debugger yet. My favorite IDEs for Python that do support it are PyCharm and VS Code.

A debugger can be so useful that I’ll recommend you make the switch and learn how to use the debugger in it. Trust me, it’s totally worth the pain (especially now that your code is of a considerable complexity.)

Personal advice: I using IDE debugger because Python provides a debugger in the standard library module — *pdb** — and I’ll suggest you don't get into using it now.

MS7: Feature #7 — most-shared websites

This is a good opportunity to go deep and:

  • Learn RegEx
  • Understand Python dictionaries as traditional hash tables: mapping from website name to the number of occurrences

Quick intro to RegEx:

A regular expression is a special text string for describing a search pattern.

You’re probably familiar with wildcard notations, such as *.txt, to find all text files in a file manager. You can think of regular expressions as wildcards on steroids. They allow you to search like:

I want every string that is between "http://" or "https://" and the second / after that, if present. Or else, the first /."

Here are a few of my favouite resources to learn RegEx:

MS8: Feature #8 — most common words

I’ll let you figure this one out on your own!

MS9: Print all of the above in pretty, neat tables

You must be using some print statements to print the results of each milestone. Now, its time to focus on the presentation of those results. Print all the above results in pretty, neat tables.

To do this, you might need to restructure a large portion of the code in order to decouple the print statements from the function definitions (assuming you haven’t already been doing it).

This is a good opportunity to go deep and:

  • Realize what it means when people advise — “functions should do just one thing”
  • Learn to search, install, and use third-party modules that Python’s awesome, vibrant community provides through pip
  • Give a personal touch to the project with the way you design the tables

Quick primer on Python’s rich ecosystem of open-source, third-party packages:

Python’s ecosystem has contributors ranging from individual developers to megacorps like Facebook and Google (rich ecosystem, eh?). They offer modules and libraries of code to aid in website construction, numeric programming, game development, data science, machine learning, deep learning, and, well, printing pretty tables.

Now, that’s a whole lot of code you don’t have to write.

PyPI is the home to all these third-party Python packages. You can find a page on every open-source, third-party package here.

Here are a few things that’ll get you up to speed to using PyPI:

  • You can install every package using a simple terminal command — pip. You can find exactly what you need to type on a package's page in PyPI.
pip install tabulate
  • Any good package also has a how-to-use guide (or documentation) on its page in PyPI
  • Even newbies can publish their experimental packages. You should be careful before using them; they may be incomplete or unmaintained. You can check out a package’s release history or its GitHub statistics to determine its credibility.

MS10: Make all of this work for group chat files

With this milestone, you’ll be extending your program to a new case — group chats. Up until this point, you would have a direct message chat file with one friend. Now, you’ll modify your program so it’ll work with WhatsApp group-chat files as well.

This is a good opportunity to go deep and:

  • Evaluate your functions. Are you able to reuse at least some of them?
  • Feel the benefits of a good coding style and good programming practices
  • See the importance of a version-control system and learn Git

Good software

It’ll do you well to remember what Brian W. Kerninghan says about good software in his book The Practice of Programming:

The basic principles that form the bedrock of good software are simplicity, which keeps programs short and manageable; clarity, which makes sure they are easy to understand, for people as well as machines; generality, which means they work well in a broad range of situations and adapt well as new situations arise; and automation, which lets the machine do the work for us, freeing us from mundane tasks.

All right, I hope this has been useful for you. You’ll gain a true understanding of all the mini lessons in this guide once you actually dive into doing the project yourself.

Here’s some code to give your start a boost:

Don’t be afraid of starting out because things will difficult when you get stuck. That is the adventure; it’ll feel super cool every time you dig yourself out.

Also, you can tell me or your fellow learners about your doubts in the Build To Learn Slack group (please feel free to join using the given link).

As an ending note, I’d like you to remember the words of Jen Simmons as you work on this project (or any other programming project for that matter):

Whatsapp Chat Analyser is one of the 20 cool programming projects that I mentioned in the last post in the series — Build To Learn. If you want me to do a similar guide for any of the others, feel free to comment below or reach out to me directly!

Subscribe to the Build To Learn newsletter to get an email when I do new guides and articles.

You can reach out me on both Twitter and LinkedIn.

Better Programming

Advice for programmers.

Thanks to Zack Shapiro

Nityesh Agarwal

Written by

Learning | Writing | Teaching at https://www.buildtolearn.club/

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade