Inbox Zero without declaring bankruptcy

Jeremy Smith
4 min readSep 20, 2020

--

tl;dr: I wrote a script that analyzers a provided Gmail dump file to give convenient Gmail search queries to quickly archive emails you likely don’t need to read. This is done by looking at inbox emails by sender, as I discovered much of my inbox was populated by transactional/ad emails from a relatively small number of senders. I prefer it over blankly archiving all unread messages as you may miss out on something important. The script is more secure than an online service as you don’t have to give any OAuth permissions nor send your data anywhere.

The Problem

I’d never had Inbox Zero on my personal Gmail. In fact, after over a decade of poor email hygiene, I had accumulated a mountain of unread messages (500+) with 8,000+ total in the inbox. It felt very uncomfortable having such a mess but it also felt irresponsible to simply “declare bankruptcy” and archive all or most. I thought there must be a faster way to review emails without combing through all 8,000+. After some thinking and coding, I was able to get through the pile with peace of mind in only a few hours!

A screenshot taken during the process. That’s a lot of emails!

I was surprised I had so many messages as I don’t use my personal email that much. My theory was that the vast majority were transactional and advertisement emails — receipts from places like Amazon and Uber and marketing SPAM. And hidden among these messages were ones I wanted to see.

So if my theory were correct, I should expect to see a long-tail of inbox emails from these companies: maybe 100 from Amazon, 200 from Uber, 50 from Uniqlo and so on. I thought: if I could produce a histogram of my emails’ grouped by senders, I could construct a Gmail query of emails that can safely be auto-archived.

It seemed reasonable to expect the histogram to follow a Power Law; there are few services I use enough that I’d permit that level of SPAM before unsubscribing. Keeping that in mind, if of the 8,000 messages, say 85% were instant-archivable, that’d be hundreds of senders. This would be slow to do via a manual process like (1) Glance at inbox to find a contender (2) perform a Gmail search (often slow) (3) Select all and archive all (also often slow) (4) Repeat for all several hundred senders.

The process seemed like something that could be sped up with some lightweight analytics. I looked into some Email Analytics services, but was uncomfortable with providing companies OAuth access to my personal email. I decided it shouldn’t be that hard to build my own…

The Solution

I started by looking at the Gmail API but realized the bulk operations I wanted to perform may be pretty slow and add to the complexity of having to implement OAuth. Instead, given this app was likely a one-time use per user, I could eliminate the tricky Google API part. Using Google Takeout, I was able to get a dump of my inbox emails in MBOX format, and then whip together a command-line script to do some lightweight analytics

The script groups all emails by sender and sorts by total inbox emails. It then does histogram bucketing on these counts; showing you your worst offenders and grouping them. Finally, it spits out Gmail search queries so you can easily auto-archive the obvious emails (I’m looking at you, Uber/Amazon!). Funnily enough, the tip of my histogram was mostly emails from friends or reminders I sent myself that I never archived, though after the first dozen or so emails, the “junk” appeared.

Sure enough the script confirmed the theory. 46 senders accounted for nearly 45% of all of my inbox emails!

In the end, I was able to get to Inbox Zero without bankruptcy. Along the way, I found a few forgotten emails that were valuable —some highlights were an opportunity to donate an old laptop to a refuge family, a reminder about a refund a family member was owed, a volunteer opportunity, and unread messages from friends.

Now, you try!

Feel free to use / fork the script yourself: Email Swifter.

Remember this script runs 100% on the executing machine and does not send any outgoing requests; all data stays on your computer.

The steps are:

0. Get your own MBOX export. This can be done with Gmail via Google Takeout. Be sure to set the filter to only grab emails from your inbox.

  1. Clone the repository:

$ git clone git@github.com:jeremyis/inbox-swifter.git

2. Install dependencies:

$ cd inbox-swifter && npm install

3. Run email-swifter!

$ node inbox-swifter.js

Inbox Swifter will prompt you for some input, most of which is optional and the most important of which is the path to your MBOX file.

It will return a few sets of data:

  • A sorted list of frequent senders recently (within the past 30 days).
  • A sorted list of frequent all-time senders
  • A histogram of all time senders with a % of your total unreads for each bucket.
Sample histogram output

And it will prompt you for any of the all-time senders you’d like to exclude from the Gmail filters it will produce after.

Copy and paste those filters into your Gmail search bar to archive! Note that many queries may be produced as the Gmail search inbox appeared to have a character limit.

--

--