Generic Sentiment Analysis On Cloud

Ashhadul Islam

Published in

tech-that-works

4 min readMay 15, 2019

Another step towards democratization of Data Science

Different companies have different sets of data that they would want analysed.

A company working on twitter dataset would primarily deal with tweets.

Tweets classified as Hate, Threat and Neutral

A news agency would have to grapple with news streaming in from different channels.

News belonging to business, sport, entertainment etc

An app building company would be interested in the reviews given by their users.

Say the twitter company wants to classify tweets on whether they are hateful, negative or positive.

The news agency on the other hand wants to classify news based on the category it belongs to — like Sports, Entertainment, Politics, Crime etc.

If you look closely, the requirement is same, just that the outputs should be different.

Considering this, Aditya Kumar and I came up with a Django web application that could act as a classifier of text, given that it gets trained on a similar text that is labelled.

First, hop in to https://mysentimeter.herokuapp.com/senti/

There are two sections to this page, the Training section

This is where you train the model with your data set

The second part of the application is where you use the trained model and test on un labelled dataset.

For instance, we want to classify tweets.

We have a training file that contains tweets labelled with Positive and Negative.

Based on this data we want to train a model and use it to predict whether tweets in the test file are positive or negative

First we train our model using the train file.

Currently we support SVM and Naive Bayes. So check the same in the Training section.

We will start giving support for the remaining soon

Next we need to choose the training file.

Click on Choose file and select the file on which you want to train.

Once you click on submit, after a while you will see something interesting happening at the test section.

As we can see, a new reference file has been created according to the training file that we had chosen. This is an identifier for the model so that when we test with our unlabelled dataset, we can choose which model should be used.

The text area is in case we want to test on just one block of text

Click on choose file to select the file to be tested.

Visualisation that shows how many were marked as positive and how many negative by each library

Note the Get result File button at the bottom.

When we click on Get Result File, we get to download the csv file which contains the data and the labels.

We can now use this file to enhance our analysis.

Sample Datasets-

Train Dataset

Test Dataset

Application Running at: https://mysentimeter.herokuapp.com/senti/

Code Repository: https://github.com/ashhadulislam/sentiment-analysis

If you face any issues or if you feel that there is a feature that the app absolutely cannot live without, please feel free to create an issue/feature request in github.

Generic Sentiment Analysis On Cloud

Written by Ashhadul Islam