Identifying spam with Naive Bayes algorithm

David Chia
Aug 28, 2017 · 1 min read

Today I just started the trial for Udacity’s Machine Learning Nanodegree, and this is the first practice project of the course. The objective of the project is to implement the Naive Bayes algorithm and use it to identify spam messages.

Below is the Jupyter notebook that I have created. It is a simple implementation aimed at going through the machine learning process from end to end. Having this first go was helpful in developing a better understanding of the different stages involved in solving a machine learning problem and how to implement them in code. It was also useful for understanding how Naive Bayes works in practice and its limitations, such as not considering interaction between features.

Of course, there are many improvements that can be made, and many different models that can be trained, but those are not the focus of this practice.

For a more detailed implementation including explanation of Naive Bayes algorithm, read this tutorial from Udacity.


davidchia

On data, code and more

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade