Applying Text Classification Using Logistic Regression

A comparison between BoW and Tf-Idf

Idil Ismiguzel
Analytics Vidhya

--

Photo by Daniel Eledut on Unsplash

Creating “language-aware data products” are becoming more and more important for businesses and organizations. Leveraging on machine learning and NLP, organizations can interact with their customers both rationally and emotionally, improve their customer experience, and provide tailored assistance.

Although text is unstructured data and it is generally produced by people to be understood by other people. So how can we process a large amount of text, transform it into a representation that we can build on machine learning models to predict and classify?

If you want to get an introduction to applications of text and NLP per se, you can have a look at my previous article.

In this article, I will be investigating Amazon’s fine food reviews dataset which spans a period of more than 10 years, including ~500,000 reviews between Oct 1999 — Oct 2012. Step by step I will be building a machine learning model to determine whether a review is positive or negative.

You can reach all the code here in my GitHub!

Let’s start with the basic data exploration: Dataset consists of 568,411 reviews with 9 attributes which are

  1. ProductId unique identifier of the…

--

--

Idil Ismiguzel
Analytics Vidhya

Data Scientist | Writing articles on Data Science & Machine Learning | MSc, MBA | https://de.linkedin.com/in/idilismiguzel