Identifying spam with Naive Bayes algorithm
Today I just started the trial for Udacity’s Machine Learning Nanodegree, and this is the first practice project of the course. The objective of the project is to implement the Naive Bayes algorithm and use it to identify spam messages.
Below is the Jupyter notebook that I have created. It is a simple implementation aimed at going through the machine learning process from end to end. Having this first go was helpful in developing a better understanding of the different stages involved in solving a machine learning problem and how to implement them in code. It was also useful for understanding how Naive Bayes works in practice and its limitations, such as not considering interaction between features.
Of course, there are many improvements that can be made, and many different models that can be trained, but those are not the focus of this practice.
For a more detailed implementation including explanation of Naive Bayes algorithm, read this tutorial from Udacity.
