Ham or Spam? SMS Text Classification with Machine Learning

A Naive Bayes Implementation in R

Deepal Dsilva

Published in

Tech & Career Nuggets

4 min readAug 14, 2018

Introduction

The use of mobile phones has skyrocketed in the last decade leading to a new area for junk promotions from disreptable marketers. People innocently give out their mobile phone numbers while utilizing day to day services and are then flooded with spam promotional messages.

In this post we will take a look at classifying SMS messages using the Naive Bayes Machine Learning model, understand why Naive Bayes works well for this use case and also dive a little into wordclouds to visualize this dataset. So let’s get started!

Importing Libraries & Data

We will use the dataset from the SMS Spam Collection to create a Spam Classifier. This can be downloaded from the UCI Machine Learning Repository.

This dataset includes the text of SMS messages along with a label indicating whether the message is unwanted. Junk messages are labeled spam, while legitimate messages are labeled ham.

Ham or Spam? SMS Text Classification with Machine Learning

A Naive Bayes Implementation in R

Introduction

Importing Libraries & Data

Written by Deepal Dsilva