Building a Spam Filter from Scratch Using Machine Learning — Machine Learning Easy and Fun
The start is always the hardest. When I first started to get my hands on Machine Learning, it looked pretty straightforward. Watching all those courses which had fairly simple exercises — it seemed easy to solve any problem!
Introduction
When I finished the theoretical part, I wanted to try implementing some practical and real world example. I found it hard to begin since I didn’t know how to start. One of the simplest projects to start with was building a Spam Filter.
So now we are going to start from the bottom with real email messages and have them classified as spam and non-spam. The dataset that we are going to use is a preprocessed subset of the Ling-Spam Dataset, provided by Ion Androutsopoulos. For this solution I used the GNU Octave and Visual Studio Code.
So let’s start.
The data and the code are available on my Github account https://github.com/gognjanovski/SpamFilterMachineLearning
All the work that we need to do can be split in 5 steps:
- Prepare the Data