How to build a simple SMS spam filter with Python

Mariyasha
Analytics Vidhya
Published in
6 min readFeb 13, 2020

--

Beginner-friendly tutorial using nltk, string and pandas.

Colourful illustration of hands holding cell phones which display messages, emojis, social media likes and spam

What if I told you there’s no need to build a fancy neural network to classify SMS as spam or not?
Currently, the internet offers a variety of complex solutions with Random Forest, Pytorch and Tensorflow — but are these really necessary if a few “for” loops and “if” statements can achieve a very satisfying result?

In this tutorial, I will show you an easy way to predict whether a user-provided string is a spam message or not.
Step 1: We’ll load a dataset.
Step 2: We’ll pre-process the content of each SMS with nltk & string.
Step 3: We’ll determine which words are associated with spam or ham messages and count their occurrences.
Step 4: We’ll build a predict function returning a ham or spam label.
Step 5: We’ll collect user-provided input, pass it through the predict function and print the output.

Step 1: Loading the Dataset

--

--

Mariyasha
Analytics Vidhya

Programmer. YouTuber. I film simple tutorials about Python & Artificial Intelligence and share it with the world! 😀