Member-only story

Building a Spam Filter from Scratch Using Machine Learning — Machine Learning Easy and Fun

The start is always the hardest. When I first started to get my hands on Machine Learning, it looked pretty straightforward. Watching all those courses which had fairly simple exercises — it seemed easy to solve any problem!

Gavril Ognjanovski

Published in

Analytics Vidhya

7 min readNov 9, 2018

“closeup photo of eyeglasses” by Kevin Ku on Unsplash

Introduction

When I finished the theoretical part, I wanted to try implementing some practical and real world example. I found it hard to begin since I didn’t know how to start. One of the simplest projects to start with was building a Spam Filter.

So now we are going to start from the bottom with real email messages and have them classified as spam and non-spam. The dataset that we are going to use is a preprocessed subset of the Ling-Spam Dataset, provided by Ion Androutsopoulos. For this solution I used the GNU Octave and Visual Studio Code.

So let’s start.

The data and the code are available on my Github account https://github.com/gognjanovski/SpamFilterMachineLearning

All the work that we need to do can be split in 5 steps:

Prepare the Data

Building a Spam Filter from Scratch Using Machine Learning — Machine Learning Easy and Fun

The start is always the hardest. When I first started to get my hands on Machine Learning, it looked pretty straightforward. Watching all those courses which had fairly simple exercises — it seemed easy to solve any problem!

Introduction

Published in Analytics Vidhya

Written by Gavril Ognjanovski

Responses (4)