A Tutorial To Find Best Scikit Classifiers For Sentiment Analysis

Sentiment Analysis

With growing interaction of people in cyber space, sentiment analysis as become a key area of ML. We will use scikit to predict the bad comments in the given data set.

Here’s the flow chart of the approach that we are going to take.

Flowchart for sentiment analysis

Now the we have defined the approach, let’s get our hand dirty with the code. I have written a python notebook explaining each step. We have tried to predict bad comments using four different famous classifiers, SVC, MultinomialNB, LogisticRegression, and SGDClassifier.

Final Result: SVC: 66% MultinomialNB: 11% LogisticRegression: 59% SGDClassifier: 47%

So, Naive Bayes gives very bad result. It can just predict 11% of bad comments. SGDClassifier predicted 47% of bad comments correctly which is a considerable improvement over the Naive Bayes. Logistic Regression though has regression in its surname but its a classifier and it shows good improvement over SGDClassifier.

SVC comes out as winner with 66 % correct prediction for sentiment analysis.

As you can see, each classifier consist of many different parameters.

For example MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) has alpha, class_prior and fit_prior.

In this post, we have run each classifier with the default setting. We will try to see how we can do performance tuning by changing parameters in the next post.