How to build a simple SMS spam filter with Python

Published in

Analytics Vidhya

6 min readFeb 13, 2020

Beginner-friendly tutorial using nltk, string and pandas.

Colourful illustration of hands holding cell phones which display messages, emojis, social media likes and spam

What if I told you there’s no need to build a fancy neural network to classify SMS as spam or not?
Currently, the internet offers a variety of complex solutions with Random Forest, Pytorch and Tensorflow — but are these really necessary if a few “for” loops and “if” statements can achieve a very satisfying result?

In this tutorial, I will show you an easy way to predict whether a user-provided string is a spam message or not.
Step 1: We’ll load a dataset.
Step 2: We’ll pre-process the content of each SMS with nltk & string.
Step 3: We’ll determine which words are associated with spam or ham messages and count their occurrences.
Step 4: We’ll build a predict function returning a ham or spam label.
Step 5: We’ll collect user-provided input, pass it through the predict function and print the output.

How to build a simple SMS spam filter with Python

Step 1: Loading the Dataset

Written by Mariyasha