Twitter Sentiment Analysis in NLP — Classifying positive and negative Tweets mentioned a brand name

Anh Tran

4 min readAug 8, 2020

Setting up a Twitter developer account

In order to pull data from Twitter, a developer account needs to be set up. Below are the steps to set up this account.

Create Twitter normal account
Apply for developer Twitter account: Setting -> About Twitter -> Developers

Create a new app

Fill in the information and Create

Get Key and Access Tokens from “Key and Access Tokens” tab

Get Access Token and Access Token Secret

2. Setting up a Python environment

There are few Python packages will be used in this application:

Tweepy: interact data with Twitter
Numpy: mathematical and scientific operation in Python
Sklearn: machine learning library
Pandas: data manipulation and analysis
Nltk: Natural Language Toolkit
Re: regex pattern

Running below comment in terminal to install any packages that missing: E.g. tweepy

Open IDE and create a python file: twitter_nlp.py then import all packages and modules

3. Dataset

The training dataset is in CSV format, including two columns: the first column is text content and second will be sentiment tag as 1 as positive or 0 as negative.

Dataset will be divided into training data and testing data with a ratio of 80:20.

4. Cleaning and Steaming text content

Text content will be cleaned and steamed to put in the bag of words by PorterStemmer algorithm as below

5. Pipeline

The pipeline will be used to transform a bag of words into a vector before running Gaussian Naive Bayes (GaussianNB) algorithm to train the model. This algorithm has been chosen because its accuracy score was highest among other algorithms.

Using Pipeline will help to increase efficiency and code can be reused.

Because the output of CountVectorizer is a sparse matrix, and GaussianNB requires dense data, class DenseTransformer needs to be used as a middle step in Pipeline.

Running the model on training data and get accuracy score on testing data

6. Running on Twitter’s data

Connecting application with Tweepy API:

7. Cleaning all data:

Screenshot of tweets that appearing on home line. There is one Positive comment and one

The model will run on this data and try to predict which comment is Positive and which comment is Negative. The application will action Like to Positive comment and Retweet to Negative comment. It only does so if those actions have not been taken before.

Screenshot of a running application, it predicted one Positive comment and one Negative comment as expected:

Based on this prediction, the application should take actions: Like for Positive comment and Retweet for Negative comment:

All done for this project, it definitely helps you to get off the ground and build something awesome, please share and let me know if you do. Good luck!

Twitter Sentiment Analysis in NLP — Classifying positive and negative Tweets mentioned a brand name

Written by Anh Tran