Conversations — What They Really Are In Mailburn
Hi! I’m Alexander, CTO of Mailburn. It’s time to publish a tech article and I want to tell you what it takes to properly categorize an email in Mailburn.
When you open Mailburn you see your inbox separated into Conversations, Newsletters and Others without any configuration at all. It all happens by magic because we handle this categorization on our backend. But I want to unveil our algorithms a little without disclosing all of our tricks.
Show me the source!
So, an email has arrived in your inbox. What’s next? We don’t really look at the text inside email — after all it’s just meaningless words combined into sentences. What we truly need is the system information hidden in email source. Yes, it is properly formatted, standardized and has a limited range of values. Exactly what we need!
We can easily get additional information from this source, for example
- This is a reply
- It was sent from a public service Gmail
- It was sent from Blackberry email client by a human
There are more than 50 such signs and marks we analyze and each of them has its own role. Every sign contributes to a certain trait and can have a positive or negative impact. A trait is more of an abstract concept, like “This is from a human” or “Sent with automated software” or “Feels like a promotion with coupons”.
After we finish all the analysis we have a set of traits with scores. Based on that we are making a final call on whether this email belongs to Conversations or not. If it doesn’t belong there, well, Round 2 starts and we check if it belongs to Newsletters. Same signs will have different impacts and there will be different traits scores as well.
Let’s look at the following example
We see an email which
- is a reply to your previous email. Contributes to “Sent by a human” +10;
- was sent from Gmail address. Contributes to “Sent by a human” +2;
- you have previously written to. Contributes to “Sent by a human” +4;
- has you as the only recipient. Contributes to “Sent by a human” +2. And also to “Feels like newsletter or promotion” +5.
In the end, we have 19 for Conversations and 5 against it. Based on that we mark it as a Conversation because such score is above our error threshold.
However, if we add 2 more traits like this
- It has List-Unsubscribe header. Contributes to “Feels like automated email” +10.
- It was sent using Amazon SES. Contributes to “Feels like automated email” +10.
The final score will be different which means this is not a Conversation email at all. In this case, we would check if it was a newsletter, and then if it belongs to another category and so on. In the end, we would categorize it into “Helpdesk” category which this email really is.
Here is how it looks in the code on our backend:
negative=0, target='headers', ctype='has_header',
values_list=['X-Mailer: Zendesk Mailer']
for group in CategoryRuleEngine.get_instance().groups:
self.logger.info(‘[BINARY] try category group: %s’ % group)
rules = CategoryRuleEngine.get_instance().get_binary_rules_by_group(group)
result = GenericProcessor(negative_cats).process(message, rules, **kwargs)
self.logger.info(‘[BINARY] CATEGORIZATION. Category found: %s, subject: %s’ % (result, message.subject))
More on marks and signs
They are everywhere. In the subject, sender name and in the hidden system email headers. We carefully check if they match our patterns (equals, not equals, has string or even a complex regexp) and evaluate them.
Some marks are binary, which means they can have only one value with a very high number there. It’s really hard to outweigh them. But that sometimes happens too.
Other marks are weighted and can have different values, depending on how correct is our guess or how much information we could extract. Their range of values it typically much lower than in binary so it takes 3–4 to overthrow one.
Is that all? Sounds easy
If we had more time we’d make it even simpler.
Of course, such algorithms are not bullet-proof. We have machine learning on that changes scores of some marks and we experiment all the time. Moreover, we collect and analyze all the cases where we are too close to an error threshold.
Hope this has shed some light on what nobody knows exists because it just works. Feel free to ask me questions, I am more than happy to talk backend!