How to get Email Stats from Thunderbird
The time had come to fix something I didn’t know I was missing before: a visual overview of my emails meta data.
I’m a fan of data visualization. I love reading info-graphics, analyzing patterns in scatter charts or guessing possible reasons for flashy spots in a heat map. I found the visualization of my work to be a great motivation to make progress and to try even harder. Examples are the contributions graph on my GitHub profile or the overview of views and watch time on YouTube Studio. Visuals and numbers can be very powerful when they present the information that matters in an easily understandable way.
So I thought about using this visual motivation for one of the more annoying activities in my working life: Emails. I have to read them, I have to write them. I have to understand the point of the sender and have to answer accordingly. This is work that is often not valued. Especially not from myself.
I recently stumbled upon ‘The Personal Analytics of My Life’ by Stephen Wolfram. I was impressed (and maybe a little terrified) how much data you can collect from yourself — I mean he even tracked his keystrokes for years! But the analysis of his emails inspired me to do something similar or at least to get to know, how many emails I ever received and sent. Since I’m using Mozilla Thunderbird as email client and there is currently no analytics add-on for the latest version available, I had to build something my own.
This is how I created a tool to serve statistics about my emails:
> The format
First, I had to find a way to get to my emails data at all. I searched for the files, Thunderbird stores my emails in. On Windows, it’s usually the following:
It turned out, that my emails were stored in the
mbox format, one file per IMAP directory. After looking into these files, I decided to switch my account to the
maildir format (storing one file per email), which was easier to handle when processing the email files. Also,
maildir seems to be the preferred format today, because of its scalability and performant searchability.
> The data retrieval
Now that I knew, where and how my email data was stored, I wrote a Python script to go through all email files, retrieve the data (currently just time and number of emails) and store it in JSON format. Despite redundancy I decided to store different evaluations in different JSON files for others being able to import only the needed or wanted files.
> The visualization
As you can see, there are currently just some basic numbers and a few charts to display the total number of emails per year, per month, per time of day and per day of week — divided into outgoing and incoming emails. But even this simple visualization of my email data is enough to show, how the effort of managing emails constantly increases over the last years, where my productive hours are and that I lost two of almost sixteen years of email data I had…
I’d like to extend this tool with some additional charts:
- a bar chart showing the number of emails per month of year (indicating a possible dependency of emails to seasons of the year)
- a scatter plot or heat map showing each single day colored according to the corresponding number of emails on this day
- show an evaluation of the last week/month/year compared to the previous period
You can find this project on GitHub. I’d be glad if it’s useful for someone else. Maybe you have suggestions for additional analytics about your emails you would be interested in? Let’s discuss that in the comments below.
Published: 4th October 2019