You’ve Got Mail: Email analytics with Python and Exchange

Paul Maddox
THG Tech Blog
Published in
4 min readJan 18, 2019

Introduction

It’s been some time since my day job was as a software engineer. Today, my “IDE” is Outlook, so I take every opportunity to keep my hand in. I recently had the perfect requirement:

Can I automate some email tasks with Python?

Like most companies, THG uses distribution lists for sending emails to groups of people. For a given distribution list, I wanted to know if any email responses were getting missed.

Using ExchangeLib

A quick google told me about ExchangeLib, a well featured Python module to connect with Microsoft Exchange via its EWS interface. It’s easily installed:

python3 -m pip install exchangelib

It has a straightforward method of connecting to Exchange (in my case, Office 365), although I had to manually specify the server as auto discovery didn’t work:

def connect(server, email, username, password):
"""
Get Exchange account connection with server
"""
creds = Credentials(username=username, password=password)
config = Configuration(server=server, credentials=creds)
return Account(primary_smtp_address=email, autodiscover=False,
config = config, access_type=DELEGATE)

This can be called as follows (note username and email are not necessarily the same — ask your admin):

def main():
# Connection details
server = 'outlook.office365.com'
email = 'bob.hope@test.com'
username = 'bobh@test.com'
password = getpass.getpass() # prompt for password
account = connect(server, email, username, password)

Working with folders

Now we’re connected, the fun can begin, although finding folders correctly proved to be difficult. All manner of examples on the web were outdated, at least, did not match our Office 365 config. An invaluable solution was to print my folder tree:

def print_tree(account):
"""
Print folder tree
"""
print(account.root.tree())

This informed me that my Inbox was not attached to Root, as many examples suggested, but rather in a parent called “Top of Information Store” off root:

Once you know where to look, retrieving emails is simple. Here I get the most recent emails:

def get_recent_emails(account, folder_name, count):
"""
Retrieve reverse ranked date received emails for a given folder
"""
# Get the folder object
folder = account.root / 'Top of Information Store' / folder_name
# Get emails
return folder.all().order_by('-datetime_received')[:count]

Nice use of operator overloading, ExchangeLib!

Processing emails

Once we retrieve the emails, they can be iterated over them for fun and profit. Let’s start with a simple count of who’s been sending email most:

def count_senders(emails):
"""
Given emails, provide counts of sender by name
"""
counts = defaultdict(int)
for email in emails:
counts[email.sender.name] += 1
return counts

Which can be called like this:

emails = get_recent_emails(account, 'Inbox', 50)counts = count_senders(emails)
print(counts)

I found the module not particularly fast talking to Exchange, but it will eventually print (with full names):

defaultdict(<class 'int'>, {'Carol': 1, 'Paul': 3, 'Jay': 4, 'Tom': 1 ..})

Finding emails that had been missed

Finally, on to the task at hand: how do I find emails sent to a distribution list that have no response?

First, let’s identify our “agents”: people who are the responders:

agents = {'John':True, 'Jane':True}

(Full names must be used here matching the sender’s Exchange name. For me, it would be 'Paul Maddox':True.

Next, I’m assuming the emails we’re interested in are in a separate folder and we can call get_recent_emails to retrieve them. The emails will be reverse sorted, so iterating through them newest-to-oldest finding subjects that have not already been seen from an agent:

def print_non_replies(emails, agents):
"""
Print subjects where no agents have replied
"""
dealt_with = dict()
not_dealt_with = dict()
not_dealt_with_ordered = list()
for email in emails: # newest to oldest
# Simplify subject
subject = email.subject.lower().replace('re: ', '')
if subject in dealt_with or subject in not_dealt_with:
continue
elif email.sender.name in agents:
# If most recent email was from an agent
dealt_with[subject] = email
else:
# Email from anyone else has not been dealt with
not_dealt_with[subject] = email
not_dealt_with_ordered += [email.subject]
print('NOT DEALT WITH:')
for subject in not_dealt_with_ordered:
print(' * ', subject)

This provides the following output:

NOT DEALT WITH:
* Join the 2019 BI Trends webinar series
* Reporting dashboard updated successfully confirmation

I can run this on an arbitrary number of emails going back in time and, assuming subjects are unique, it will provide a historic list of emails without responses!

Conclusion

In this article I’ve introduced a Python module that can be used to work with Microsoft Exchange (and Office 365) accounts, and shown an example working with emails. Further information can be found here:

https://pypi.org/project/exchangelib/

My full code follows at the end of this article.

We’re recruiting

Find out about the exciting opportunities at THG here:

Full code

--

--