Generic Sentiment Analysis On Cloud

Ashhadul Islam
tech-that-works
Published in
4 min readMay 15, 2019
Another step towards democratization of Data Science

Different companies have different sets of data that they would want analysed.

A company working on twitter dataset would primarily deal with tweets.

Tweets classified as Hate, Threat and Neutral

A news agency would have to grapple with news streaming in from different channels.

News belonging to business, sport, entertainment etc

An app building company would be interested in the reviews given by their users.

App reviews in play store

Say the twitter company wants to classify tweets on whether they are hateful, negative or positive.

The news agency on the other hand wants to classify news based on the category it belongs to — like Sports, Entertainment, Politics, Crime etc.

If you look closely, the requirement is same, just that the outputs should be different.

Considering this, Aditya Kumar and I came up with a Django web application that could act as a classifier of text, given that it gets trained on a similar text that is labelled.

First, hop in to https://mysentimeter.herokuapp.com/senti/

There are two sections to this page, the Training section

This is where you train the model with your data set

The second part of the application is where you use the trained model and test on un labelled dataset.

The test section

For instance, we want to classify tweets.

We have a training file that contains tweets labelled with Positive and Negative.

Labelled Data Set in the training file

Based on this data we want to train a model and use it to predict whether tweets in the test file are positive or negative

Un labelled dataset in the test file

First we train our model using the train file.

Currently we support SVM and Naive Bayes. So check the same in the Training section.

We will start giving support for the remaining soon

Next we need to choose the training file.

Click on Choose file and select the file on which you want to train.

Choose the training file

Once you click on submit, after a while you will see something interesting happening at the test section.

Reference file created dynamically

As we can see, a new reference file has been created according to the training file that we had chosen. This is an identifier for the model so that when we test with our unlabelled dataset, we can choose which model should be used.

The text area is in case we want to test on just one block of text

Click on choose file to select the file to be tested.

Select the test file
Visualisation that shows how many were marked as positive and how many negative by each library

Note the Get result File button at the bottom.

When we click on Get Result File, we get to download the csv file which contains the data and the labels.

Classification by SVM and Naive Bayes

We can now use this file to enhance our analysis.

Sample Datasets-

Train Dataset

Test Dataset

Application Running at: https://mysentimeter.herokuapp.com/senti/

Code Repository: https://github.com/ashhadulislam/sentiment-analysis

If you face any issues or if you feel that there is a feature that the app absolutely cannot live without, please feel free to create an issue/feature request in github.

--

--