Analyze and visualize your Gmail inbox using Elasticsearch and Kibana

I initially wanted to find a way to unclog my Gmail inbox by executing a terms aggregation on all of the messages in my inbox to find out how many emails are coming from each sender, that way I could see who or which service is clogging my inbox and simply unsubscribe and delete these emails. This led me to use Gmail’s API, Elasticsearch and Kibana to find out the answer to my question and have some fun with analyzing my inbox.

To start, I followed this quick guide on how to enable Gmail’s API for my account. After getting the proper credentials, I’ve modified the provided quickstart.js file to include the logic of listing and iterating over all the messages in my inbox, extracting the fields and pushing the results into Elasticsearch. You probably don’t have to worry about limits or anything like that since Gmail’s API has a limit of 1 billion requests per day and 250 requests per second. You can find my modified quickstart.js version here, still work in progress.

This is the index template that I’ve set up prior to running the script, this ensures that the fields from and to are available in both analyzed and not_analyzed forms to allow us to search (on from/to) and aggregate (on from.raw/to.raw):

{
"order": 0,
"template": "gmail",
"settings": {},
"mappings": {
"message": {
"properties": {
"dayOfWeek": {
"index": "not_analyzed",
"type": "string"
},
"from": {
"index": "not_analyzed",
"type": "string",
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
}
},
"to": {
"index": "not_analyzed",
"type": "string",
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
}
}
}
}
},
"aliases": {}
}

So now that we’ve ran our script and data is flowing in, let’s draw our dashboard using Kibana. Here’s some stuff I wanted to know:

  1. Top 10 senders
  2. Top 10 senders without proper security (has no SPF/DKIM/DMARC)
  3. The hour of day which I receive most of my emails at.
  4. The day of week which I receive most of my emails at.
  5. Date histogram of emails

Here’s the final result

Here’s the JSON file for the Kibana dashboard. Feel free to comment on the code and dashboard or suggest ideas for other cool visualizations we can all create for our inbox.