Ham or Spam? SMS Text Classification with Machine Learning

A Naive Bayes Implementation in R

Deepal Dsilva
Tech & Career Nuggets

--

Introduction

The use of mobile phones has skyrocketed in the last decade leading to a new area for junk promotions from disreptable marketers. People innocently give out their mobile phone numbers while utilizing day to day services and are then flooded with spam promotional messages.

In this post we will take a look at classifying SMS messages using the Naive Bayes Machine Learning model, understand why Naive Bayes works well for this use case and also dive a little into wordclouds to visualize this dataset. So let’s get started!

Importing Libraries & Data

We will use the dataset from the SMS Spam Collection to create a Spam Classifier. This can be downloaded from the UCI Machine Learning Repository.

This dataset includes the text of SMS messages along with a label indicating whether the message is unwanted. Junk messages are labeled spam, while legitimate messages are labeled ham.

--

--

Deepal Dsilva
Tech & Career Nuggets

Demo Engineer at Salesforce | Data Analyst | Always learning!