Using Machine Learning to Classify Emails
When running outbound email marketing campaigns at scale, it becomes really important to track the efficacy of each campaign — particularly further downstream when categorizing responses. While it’s possible to manually label each email as “Interested” or “Not-Interested”, this can be both an unreliable (in the statistical sense) and somewhat time-consuming burden for BDRs/SDRs. To maintain a high degree of insight into campaigns, automating the process of categorizing emails becomes paramount.
There are a few tools available for this kind of project, particularly on the Python side of things. Being more familiar with Javascript, I wanted to limit the scope of potential options here. The most popular and most-recently released machine learning tool I found is Tensorflow.js, although this largely seemed like overkill for the project. Instead, I settled on a much more simplistic and introductory library called Brain.js.
To me, the hardest part with any project seems to be the ETL process (Extract-Transform-Load). After pulling all the email responses from a service I used, I had to transform them to a CSV, then manually go through the CSV and categorize each response, and finally converting back to a more efficient JSON with just input and output keys.

With the cleansed data, we set up a recurrent LSTM network and iterate over the data for 1000 iterations. So that we can reuse and test various trained data, it’s always best to save trained data to a file that accurately describes the training that was done. Once trained, we can test a variety of inputs:

And just like that, we owned machine learning:
But wait…
What if we try testing something else that’s similar?

Or maybe something pretty explicit that someone’s not interested:

Obviously this neural net would be horrible. It’s almost analogous to Jin Yang’s “Seefood” app in Silicon Valley — an app that supposedly can take a picture of any food and tell you what it is. In reality, it could only return “Hot Dog” or “Not Hot Dog”. If that’s accurate, however, it’s more productive than this code currently is.
So what’s the underlying issue here? What’s going wrong and how do we improve results?
Understanding More About Machine Learning
As I quickly diagnosed, there were two issues at hand:
- The sample size for the test data was not large enough.
- I was not employing hidden layers to help the net comprehend what I’m trying to train it to do.
As a result, the data was being overfit to the sample data. This became particularly evident when trying run 5,000 iterations on the data:

While not 100% confident as to the causal factors behind the above error, I believe it’s because after a certain point of “micro-optimizations”, the net attempted to completely reconfigure its weights to discover a more advantageous “macro-optimization” but was unable to do so. Additionally, as there were not enough hidden layers, the net could not effectively parse these large sentences being thrown at it.
The below effectively summarizes what was going on:

After applying a more robust sample set, providing more hidden layers, and adding more iterations, how do things look?

It’s definitely more accurate than previous results, but still requires more work to fine tune edge cases. For some reason, the “me” at the end of “Please don’t contact” is a switch point that alters its classification. Similarly, I believe the use of “Thanks” twice also confuses the net into thinking the individual is interested based off seed data.
I guess, ultimately, just like the machine I’m training, I’m still learning…
