Twitter Sentiment Analysis in NLP — Classifying positive and negative Tweets mentioned a brand name

Anh Tran
4 min readAug 8, 2020

--

  1. Setting up a Twitter developer account

In order to pull data from Twitter, a developer account needs to be set up. Below are the steps to set up this account.

  • Create Twitter normal account
  • Apply for developer Twitter account: Setting -> About Twitter -> Developers
  • Create a new app
  • Fill in the information and Create
  • Get Key and Access Tokens from “Key and Access Tokens” tab
  • Get Access Token and Access Token Secret

2. Setting up a Python environment

There are few Python packages will be used in this application:

  • Tweepy: interact data with Twitter
  • Numpy: mathematical and scientific operation in Python
  • Sklearn: machine learning library
  • Pandas: data manipulation and analysis
  • Nltk: Natural Language Toolkit
  • Re: regex pattern

Running below comment in terminal to install any packages that missing: E.g. tweepy

Open IDE and create a python file: twitter_nlp.py then import all packages and modules

3. Dataset

The training dataset is in CSV format, including two columns: the first column is text content and second will be sentiment tag as 1 as positive or 0 as negative.

Dataset will be divided into training data and testing data with a ratio of 80:20.

4. Cleaning and Steaming text content

Text content will be cleaned and steamed to put in the bag of words by PorterStemmer algorithm as below

5. Pipeline

The pipeline will be used to transform a bag of words into a vector before running Gaussian Naive Bayes (GaussianNB) algorithm to train the model. This algorithm has been chosen because its accuracy score was highest among other algorithms.

Using Pipeline will help to increase efficiency and code can be reused.

Because the output of CountVectorizer is a sparse matrix, and GaussianNB requires dense data, class DenseTransformer needs to be used as a middle step in Pipeline.

Running the model on training data and get accuracy score on testing data

6. Running on Twitter’s data

Connecting application with Tweepy API:

7. Cleaning all data:

Screenshot of tweets that appearing on home line. There is one Positive comment and one

The model will run on this data and try to predict which comment is Positive and which comment is Negative. The application will action Like to Positive comment and Retweet to Negative comment. It only does so if those actions have not been taken before.

Screenshot of a running application, it predicted one Positive comment and one Negative comment as expected:

Based on this prediction, the application should take actions: Like for Positive comment and Retweet for Negative comment:

All done for this project, it definitely helps you to get off the ground and build something awesome, please share and let me know if you do. Good luck!

--

--