Building a Spam Filter from Scratch Using Machine Learning — Machine Learning Easy and Fun

The start is always the hardest. When I first started to get my hands on Machine Learning, it looked pretty straightforward. Watching all those courses which had fairly simple exercises — it seemed easy to solve any problem!

Gavril Ognjanovski
Analytics Vidhya

--

“closeup photo of eyeglasses” by Kevin Ku on Unsplash

Introduction

When I finished the theoretical part, I wanted to try implementing some practical and real world example. I found it hard to begin since I didn’t know how to start. One of the simplest projects to start with was building a Spam Filter.

So now we are going to start from the bottom with real email messages and have them classified as spam and non-spam. The dataset that we are going to use is a preprocessed subset of the Ling-Spam Dataset, provided by Ion Androutsopoulos. For this solution I used the GNU Octave and Visual Studio Code.

So let’s start.

The data and the code are available on my Github account https://github.com/gognjanovski/SpamFilterMachineLearning

All the work that we need to do can be split in 5 steps:

  1. Prepare the Data

--

--

Gavril Ognjanovski
Analytics Vidhya

Helping $1M-$25M Small and Medium-Sized Businesses Digitalize and Scale Their Products and Services Online. gagodev.com email: gavril.ognjanovski@gagodev.com