Analyse of your emails to figure out who your top contacts and collaborating organisations are

Daniel Iversen
Daniel Iversen’s small misc notes
2 min readJan 3, 2019

I wanted to figure out who are the top people I communicate with via email (to ensure I have them as proper fleshed out contacts if needed), and the top organisations I have worked with (more for reflection), and this is how I did it).

[UPDATE 2022–11–13: Added fix for “Operation not permitted” error, updated to Python3 and replaced one of the scripts with this one]

You need:

  • A Mac
  • Your emails downloaded in Mail.app (you can set it up temporarily even if you don’t use Mail App regularly)
  • Be comfortable with command line/Terminal and some basic scripting

The steps are:

1. get all email addresses (and more) out of your emails

This is in the Mail.app folder (i.e. in my case “/Users/danieliversen/Library/Mail/V6/87B4E29D-35EB-424F-A484-FA2D371CCB37”)

find . -name '*emlx' -print0|xargs -0 grep "@" >> ~/Desktop/emails-full.log

If you get an error “Operation not permitted” you need to give Terminal access to your full disk (you can do that in the Security Preferences of MacOS — see guide here)

2. create python script to extract actual email addresses

#!/usr/bin/env python
#
# Extracts email addresses from one or more plain text files.
#
# Notes:
# - Does not save to file (pipe the output to a file if you want it saved).
# - Does not check for duplicates (which can easily be done in the terminal).
#
# (c) 2013 Dennis Ideler <ideler.dennis@gmail.com>
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo@bar.com' as '//foo@bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print(email)
else:
print('"{}" is not a file.'.format(arg))
parser.print_usage()

3. run the script to extract all emails

python3 get-emails.py emails-full.log >> the-emails.txt

If the command ‘python3’ doesn’t work, try ‘python’

4. filter out domains that are non-essential

Filter out domains that are non-essential i.e. emails from automated systems etc


cat the-emails.txt |egrep -v -i “dropbox.com|gmail.com|googlegroups.com|google.com|salesforce.com|hanfordmedia.com|egencia.com.au|apple.com” >> the-emails-filtered.txt

5. find the top people 400 you mail with, sorted by amounts of mentions

(this may include false positives)

cat the-emails-filtered.txt |awk '{FS=”'\''”;print $2}'|sort |uniq -c|sort -nr |head -400

6. find the top 75 domains/organisations that you communicate with


cat the-emails-filtered.txt |awk ‘{FS=”@”; print $2}’|awk ‘{FS=”’\’’”;print $1}’|sort|uniq -c |sort -nr|head -75

--

--

Daniel Iversen
Daniel Iversen’s small misc notes

Business, Web and Geekiness. Been living in Australia since late 2001. Husband and father. ex FatWire, Oracle, Dropbox. Now at Asana